kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

urllib/wget download #59

Closed kfuku52 closed 1 year ago

kfuku52 commented 3 years ago

This service may not be publicly available in your country I'm not sure why this stdout "ERROR" message implies the country is the problem. Is there any relationship with URLError?

Also, please remove the dependency to the python package wget. We should directly use the command wget without the library. Or is there any problem with subprocess.run()?

        for sra_source in sra_source_list:
            print("trying to fetch {} from {}".format(sra_id, sra_source))
            try:
                urllib.request.urlretrieve(str(sra_source), os.path.join(work_dir, (str(sra_id + '.sra'))))
                dl_status = 'success'
            except urllib.error.URLError:
                print("ERROR: urllib.request did not work. Trying wget")
                try:
                    import wget
                    wget.download(str(sra_source), os.path.join(work_dir, (str(sra_id + '.sra'))))
                except ModuleNotFoundError:
                    print("ERROR: Could not find wget")
                except urllib.error.URLError:
                    print(
                        "ERROR: Could not download from " + sra_source + ". This service may not be publicly available in your country.")
                    dl_status = 'failed'
                    continue
Hego-CCTB commented 3 years ago

Availability of a link, esp. Google and Amazon are based on country. I.e. some GWS data is only available from within the US, while others are worldwide. During testing I noticed that a trying to download from a GWS link only available to the US, Urllib.request will throw a URLError within python.

I used the "in-house" python wget, because I felt this was a more 'pythonic' approach. I will change this to subprocess.run(), no problem.

kfuku52 commented 3 years ago

I'm afraid that URLError can be caused by many problems, and the error message would be misleading when the cause isn't the country.

I used the "in-house" python wget, because I felt this was a more 'pythonic' approach.

That pythonic approach uses urllib, and you've implemented a urllib-based download already. You can check how the package wget is implemented from here: https://pypi.org/project/wget/#files

kfuku52 commented 2 years ago

@Hego-CCTB I just found that the package wget, not the command wget, is still used in getfastq. Is there any reason to use it?

Hego-CCTB commented 2 years ago

If there was any reason, I forgot about it. I'll change it to use subprocess.run() with the wget.

Hego-CCTB commented 2 years ago

Done. I'll push the update later today.

Hego-CCTB commented 1 year ago

This was done. Never came back to close this.