pepkit / geofetch

Builds a PEP from SRA or GEO accessions
https://pep.databio.org/geofetch/
BSD 2-Clause "Simplified" License
45 stars 5 forks source link

fetch_all does not work #130

Closed kapedalex closed 7 months ago

kapedalex commented 8 months ago
from geofetch import Geofetcher

geof = Geofetcher(just_metadata=False,
                  processed=False,
                  max_soft_size="12GB",
                  data_source="all")

geof.fetch_all("GSE95654")

Response for geofetch -i GSE95654 is the same

The system cannot find the path specified. '{' is not recognized as an internal or external command, operable program or batch file. To download raw data You must first install the sratoolkit, with prefetch in your PATH. Installation instruction: http://geofetch.databio.org/en/latest/install/

Problem is, that sratoolkit is actually installed and added in path. I can use prefetch GSE95654 in terminal and everything will be ok.

Basic guides like

find_gse = Finder()
gse_list = find_gse.get_gse_all()

and

geof = Geofetcher(processed=True, acc_anno=True, discard_soft=True)
geof.get_projects("GSE160204")

works fine

khoroshevskyi commented 8 months ago

Thank you for raising an issue. Could you please provide us with the full geofetch log and information about the system on which you are running geofetch, including the Python version?

I have just installed prefetch and Geofetch, and unfortunately, I can't reproduce the error. Additionally, note that prefetch can now be installed using the following command:

sudo apt install sra-toolkit
pedro-w commented 8 months ago

I also have seen this, using command

> geofetch --verbosity 5 -i GSE135644 -m dl
[INFO] [15:46:54] Metadata folder: D:\Libraries\lda\dl\GSE135644
The system cannot find the path specified.
'{' is not recognized as an internal or external command,
operable program or batch file.
To download raw data You must first install the sratoolkit, with prefetch in your PATH. Installation instruction: http://geofetch.databio.org/en/latest/install/

My version is

Python 3.11.7 (tags/v3.11.7:fa7a6f2, Dec  4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]

(Windows 11)

khoroshevskyi commented 8 months ago

Thank you for your response @pedro-w . Unfortunately, geofetch wasn't tested on Windows. We will try to solve this issue ASAP.

pedro-w commented 8 months ago

Thanks for the swift response. I don't know if @kapedalex is also on Windows?

I'm happy to help test anything, just let me know.

pedro-w commented 8 months ago

I'll just add that it does work in WSL (Debian) - I am guessing it is assuming a POSIX shell somewhere which fails under Windows?

khoroshevskyi commented 8 months ago

Sorry, that it takes so long. Unfortunately, I don't have access to a Windows laptop every day. However, I have identified an error, and it appears to be occurring within one of the imported libraries. I will continue working on resolving this issue next week.

pedro-w commented 7 months ago

Hi. I tried your pr #132 and it seemed to work (see below) but it's still giving the

The system cannot find the path specified. 
'{' is not recognized as an internal or external command, operable program or batch file.

lines. I don't know where this is coming from, but if it's nothing to worry about then I'm fine with that.


(venv) PS D:\temp\geofetch> python -m geofetch -i GSE67303 -n red_algae -m d:\temp\gf-test
[INFO] [09:05:26] Metadata folder: d:\temp\gf-test\red_algae
The system cannot find the path specified.
'{' is not recognized as an internal or external command,
operable program or batch file.
[WARNING] [09:05:26] GEOfetch is not checking if prefetch is installed on Windows, please make sure it is installed and in your PATH, otherwise it will not be possible to download raw data.
[INFO] [09:05:26] Trying GSE67303 (not a file) as accession...
[INFO] [09:05:26] Skipped 0 accessions. Starting now.
[INFO] [09:05:26] Processing accession 1 of 1: 'GSE67303'
[INFO] [09:05:26] Found previous GSE file: d:\temp\gf-test\red_algae\GSE67303_GSE.soft
[INFO] [09:05:26] Found previous GSM file: d:\temp\gf-test\red_algae\GSE67303_GSM.soft
[INFO] [09:05:26] Processed 4 samples.
[INFO] [09:05:26] Expanding metadata list...
[INFO] [09:05:26] Found SRA Project accession: SRP056574
[INFO] [09:05:26] Found SRA metadata, opening..
[INFO] [09:05:26] Parsing SRA file to download SRR records
[INFO] [09:05:26] Getting SRR: SRR1930183  in (GSE67303)

2024-01-22T09:05:28 prefetch.3.0.10: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-22T09:05:30 prefetch.3.0.10: 1) Downloading 'SRR1930183'...
2024-01-22T09:05:30 prefetch.3.0.10: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-22T09:05:30 prefetch.3.0.10:  Downloading via HTTPS...
2024-01-22T09:06:06 prefetch.3.0.10:  HTTPS download succeed
2024-01-22T09:06:06 prefetch.3.0.10:   verifying 'SRR1930183'...
2024-01-22T09:06:06 prefetch.3.0.10:  'SRR1930183' is valid
2024-01-22T09:06:06 prefetch.3.0.10: 1) 'SRR1930183' was downloaded successfully
2024-01-22T09:06:06 prefetch.3.0.10: 'SRR1930183' has 0 unresolved dependencies
[INFO] [09:06:06] Getting SRR: SRR1930184  in (GSE67303)

2024-01-22T09:06:08 prefetch.3.0.10: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-22T09:06:09 prefetch.3.0.10: 1) Downloading 'SRR1930184'...
2024-01-22T09:06:09 prefetch.3.0.10: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-22T09:06:09 prefetch.3.0.10:  Downloading via HTTPS...
2024-01-22T09:06:27 prefetch.3.0.10:  HTTPS download succeed
2024-01-22T09:06:27 prefetch.3.0.10:   verifying 'SRR1930184'...
2024-01-22T09:06:27 prefetch.3.0.10:  'SRR1930184' is valid
2024-01-22T09:06:27 prefetch.3.0.10: 1) 'SRR1930184' was downloaded successfully
2024-01-22T09:06:27 prefetch.3.0.10: 'SRR1930184' has 0 unresolved dependencies
[INFO] [09:06:27] Getting SRR: SRR1930185  in (GSE67303)

2024-01-22T09:06:29 prefetch.3.0.10: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-22T09:06:29 prefetch.3.0.10: 1) Downloading 'SRR1930185'...
2024-01-22T09:06:29 prefetch.3.0.10: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-22T09:06:29 prefetch.3.0.10:  Downloading via HTTPS...
2024-01-22T09:06:40 prefetch.3.0.10:  HTTPS download succeed
2024-01-22T09:06:40 prefetch.3.0.10:   verifying 'SRR1930185'...
2024-01-22T09:06:40 prefetch.3.0.10:  'SRR1930185' is valid
2024-01-22T09:06:40 prefetch.3.0.10: 1) 'SRR1930185' was downloaded successfully
2024-01-22T09:06:41 prefetch.3.0.10: 'SRR1930185' has 0 unresolved dependencies
[INFO] [09:06:41] Getting SRR: SRR1930186  in (GSE67303)

2024-01-22T09:06:42 prefetch.3.0.10: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-22T09:06:43 prefetch.3.0.10: 1) Downloading 'SRR1930186'...
2024-01-22T09:06:43 prefetch.3.0.10: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-22T09:06:43 prefetch.3.0.10:  Downloading via HTTPS...
2024-01-22T09:06:51 prefetch.3.0.10:  HTTPS download succeed
2024-01-22T09:06:51 prefetch.3.0.10:   verifying 'SRR1930186'...
2024-01-22T09:06:51 prefetch.3.0.10:  'SRR1930186' is valid
2024-01-22T09:06:51 prefetch.3.0.10: 1) 'SRR1930186' was downloaded successfully
2024-01-22T09:06:52 prefetch.3.0.10: 'SRR1930186' has 0 unresolved dependencies
[INFO] [09:06:52] Finished processing 1 accession(s)
[INFO] [09:06:52] Creating complete project annotation sheets and config file...
[INFO] [09:06:52] Sample annotation sheet: d:\temp\gf-test\red_algae\GSE67303_PEP\GSE67303_PEP_raw.csv . Saved!
[INFO] [09:06:52] File has been saved successfully
[INFO] [09:06:52]   Config file: d:\temp\gf-test\red_algae\GSE67303_PEP\GSE67303_PEP.yaml
khoroshevskyi commented 7 months ago

@pedro-w Thank you for your support. I just released new geofetch, v0.12.6. If you have chance, could you please confirm that everything works?

pedro-w commented 7 months ago

@khoroshevskyi Apologies for the delay. I tried with a clone of github tag v0.12.6, on Windows 11.

So, I can confirm it's working as expected for me.

Thanks 👍

khoroshevskyi commented 7 months ago

Thank you very much for your help!

pedro-w commented 7 months ago

@kapedalex did it fix your issue too 🤞 ?