rpetit3 / fastq-dl

Download FASTQ files from SRA or ENA repositories.
MIT License
280 stars 25 forks source link

prefer sra normalized format over sra lite #23

Open kapsakcj opened 1 year ago

kapsakcj commented 1 year ago

I've hit an odd issue where fastq-dl pulls FASTQs without issue, but they are in SRA Lite format instead of the typical SRA Normalized format.

FASTQs in SRA Lite format have ? for all Qscores for all bases, which equates to Q30. This leads to issues where trimmomatic or other typical downstream softwares are unable to detect the Phred quality encoding and the Qscore are not useful during assembly (and probably other applications that utilize the Qscores)

FASTQs in SRA Normalized are the original format that contains the full base quality scores

Some examples where I encountered this issue

I'm guessing it will be a big effort, but would it be possible for fastq-dl to download the SRA-normalized format of FASTQs?

Not sure how ENA deals with this issue, but sra-toolkit has an option for using this format

More info:

rpetit3 commented 1 year ago

So good news and bad news...

Good news first:

fastq-dl --accession SRR25316086 --verbose --outdir sra-normalized --provider sra --only-provider
...
2023-08-02 22:40:31 DEBUG    2023-08-02 22:40:31:executor.process:DEBUG - Got return code 0 from synchronous process (bash -c 'prefetch SRR25316086 --max-size 10T -o SRR25316086.sra').                                                                      __init__.py:1638
                    DEBUG    2023-08-02 22:40:31:root:DEBUG -                                                                                                                                                                                                   fastq_dl.py:92

                    DEBUG    2023-08-02 22:40:31:root:DEBUG - 2023-08-02T22:40:25 prefetch.3.0.6: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.                                                              fastq_dl.py:93
                             2023-08-02T22:40:25 prefetch.3.0.6: 1) Downloading 'SRR25316086'...
                             2023-08-02T22:40:25 prefetch.3.0.6: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
                             2023-08-02T22:40:25 prefetch.3.0.6:  Downloading via HTTPS...
                             2023-08-02T22:40:30 prefetch.3.0.6:  HTTPS download succeed
                             2023-08-02T22:40:31 prefetch.3.0.6:  'SRR25316086' is valid
                             2023-08-02T22:40:31 prefetch.3.0.6: 1) 'SRR25316086' was downloaded successfully
                             2023-08-02T22:40:31 prefetch.3.0.6: 'SRR25316086' has 0 unresolved dependencies
...
zcat sra-normalized/SRR25316086_1.fastq.gz | fastq-scan -q
{
    "qc_stats": {
        "total_bp": 150934381,
        "coverage": 0.00,
        "read_total": 1010912,
        "read_min": 35,
        "read_mean": 149.305,
        "read_std": 9.49214,
        "read_median": 151,
        "read_max": 151,
        "read_25th": 150,
        "read_75th": 151,
        "qual_min": 2,
        "qual_mean": 36.9532,
        "qual_std": 2.18818,
        "qual_max": 38,
        "qual_median": 38,
        "qual_25th": 37,
        "qual_75th": 38
    }
}

Q -scores range from 2-38, so this should get you want you need. However I can add a way to allow the user to switch between SRA Normalized and SRA Lite, with Normalized being the default.

Now the bad news:

fastq-dl --accession SRR25316086 --verbose --outdir ena
...
zcat ena/SRR25316086_1.fastq.gz | fastq-scan -q
{
    "qc_stats": {
        "total_bp": 150934381,
        "coverage": 0.00,
        "read_total": 1010912,
        "read_min": 35,
        "read_mean": 149.305,
        "read_std": 9.49214,
        "read_median": 151,
        "read_max": 151,
        "read_25th": 150,
        "read_75th": 151,
        "qual_min": 30,
        "qual_mean": 30,
        "qual_std": 0,
        "qual_max": 30,
        "qual_median": 30,
        "qual_25th": 30,
        "qual_75th": 30
    }
}

# Force SRA Lite
vdb-config --simplified-quality-scores yes
fastq-dl --accession SRR25316086 --verbose --outdir sra-lite --provider sra --only-provider
...
                    DEBUG    2023-08-02 22:44:13:root:DEBUG - 2023-08-02T22:44:11 prefetch.3.0.6: Current preference is set to retrieve SRA Lite files with simplified base quality scores.                                                                     fastq_dl.py:93
                             2023-08-02T22:44:11 prefetch.3.0.6: 1) Downloading 'SRR25316086.lite'...
                             2023-08-02T22:44:11 prefetch.3.0.6: SRA Lite file is being retrieved, if this is different from your preference, it may be due to current file availability.
                             2023-08-02T22:44:11 prefetch.3.0.6:  Downloading via HTTPS...
                             2023-08-02T22:44:12 prefetch.3.0.6:  HTTPS download succeed
                             2023-08-02T22:44:13 prefetch.3.0.6:  'SRR25316086.lite' is valid
                             2023-08-02T22:44:13 prefetch.3.0.6: 1) 'SRR25316086.lite' was downloaded successfully
                             2023-08-02T22:44:13 prefetch.3.0.6: 'SRR25316086' has 0 unresolved dependencies
...
zcat sra-lite/SRR25316086_1.fastq.gz | fastq-scan -q
{
    "qc_stats": {
        "total_bp": 150934381,
        "coverage": 0.00,
        "read_total": 1010912,
        "read_min": 35,
        "read_mean": 149.305,
        "read_std": 9.49214,
        "read_median": 151,
        "read_max": 151,
        "read_25th": 150,
        "read_75th": 151,
        "qual_min": 30,
        "qual_mean": 30,
        "qual_std": 0,
        "qual_max": 30,
        "qual_median": 30,
        "qual_25th": 30,
        "qual_75th": 30
    }
}

It looks like ENA synced the SRA Lite version of the reads, and not the Normalized. This was also the case for SRR13086318.


Hmmm, this bugs me because I usually use ENA as the default provider because they provide FASTQs directly. But I also want the original quality scores which SRA sync'd reads may or may not provide. The blog post above has a October 2021 date, so I'm unsure if after this date the reads synced from SRA to ENA have the SRA Lite Q scores.

I'm wondering if a solution might be to add a third provider: source and based on the accession download from the original provider (e.g. SRR from SRA, ERR from ENA, DRR either SRA or ENA).

mbhall88 commented 1 year ago

I'm wondering if a solution might be to add a third provider: source and based on the accession download from the original provider (e.g. SRR from SRA, ERR from ENA, DRR either SRA or ENA).

This sounds like the best first pass solution to me

kapsakcj commented 1 year ago

Thanks for the quick reply & brainstorming on solutions.

Just wanted to share this example where despite using the options Robert suggested, it still seemed to download SRA Lite formatted FASTQs. Even though the output explicitly states SRA Normalized Format file is being retrieved. Maybe I just got unlucky with this particular accession?

# fastq-dl v2.0.2 installed via mamba
$ fastq-dl -a SRR13086318 --verbose --provider sra --only-provider
2023-08-03 10:09:37 DEBUG    2023-08-03 10:09:37:root:DEBUG - Querying ENA for metadata...                                                                                      fastq_dl.py:428
                    DEBUG    2023-08-03 10:09:37:root:DEBUG - --only-provider supplied, limiting queries to sra                                                                 fastq_dl.py:431
                    DEBUG    2023-08-03 10:09:37:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): eutils.ncbi.nlm.nih.gov:443                           connectionpool.py:1003
2023-08-03 10:09:39 DEBUG    2023-08-03 10:09:39:urllib3.connectionpool:DEBUG - [https://eutils.ncbi.nlm.nih.gov:443](https://eutils.ncbi.nlm.nih.gov/) "POST /entrez/eutils/esearch.fcgi HTTP/1.1" 200 None  connectionpool.py:456
                    DEBUG    2023-08-03 10:09:39:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): eutils.ncbi.nlm.nih.gov:443                           connectionpool.py:1003
2023-08-03 10:09:40 DEBUG    2023-08-03 10:09:40:urllib3.connectionpool:DEBUG - [https://eutils.ncbi.nlm.nih.gov:443](https://eutils.ncbi.nlm.nih.gov/) "GET                                                  connectionpool.py:456
                             /entrez/eutils/esummary.fcgi?db=sra&usehistory=n&retmode=json&query_key=1&WebEnv=MCID_64cbb522a06d0e3d496a66e5&retstart=0&retmax=500
                             HTTP/1.1" 200 None
                    DEBUG    2023-08-03 10:09:40:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): eutils.ncbi.nlm.nih.gov:443                           connectionpool.py:1003
2023-08-03 10:09:41 DEBUG    2023-08-03 10:09:41:urllib3.connectionpool:DEBUG - [https://eutils.ncbi.nlm.nih.gov:443](https://eutils.ncbi.nlm.nih.gov/) "GET                                                  connectionpool.py:456
                             /entrez/eutils/esearch.fcgi?db=sra&usehistory=n&retmode=json&term=SRR13086318 HTTP/1.1" 200 None
                    DEBUG    2023-08-03 10:09:41:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): eutils.ncbi.nlm.nih.gov:443                           connectionpool.py:1003
2023-08-03 10:09:42 DEBUG    2023-08-03 10:09:42:urllib3.connectionpool:DEBUG - [https://eutils.ncbi.nlm.nih.gov:443](https://eutils.ncbi.nlm.nih.gov/) "GET                                                  connectionpool.py:456
                             /entrez/eutils/efetch.fcgi?db=sra&usehistory=n&retmode=runinfo&query_key=1&WebEnv=MCID_64cbb5240bbbf858ca74f635&retstart=0&retmax=500
                             HTTP/1.1" 200 None
                    DEBUG    2023-08-03 10:09:42:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): www.ebi.ac.uk:443                                     connectionpool.py:1003
2023-08-03 10:10:00 DEBUG    2023-08-03 10:10:00:urllib3.connectionpool:DEBUG - [https://www.ebi.ac.uk:443](https://www.ebi.ac.uk/) "GET                                                            connectionpool.py:456
                             /ena/portal/api/filereport?result=read_run&fields=fastq_ftp&accession=SRP074197 HTTP/1.1" 200 None
2023-08-03 10:10:10 INFO     2023-08-03 10:10:10:root:INFO - Query: SRR13086318                                                                                                 fastq_dl.py:629
                    INFO     2023-08-03 10:10:10:root:INFO - Archive: sra                                                                                                       fastq_dl.py:630
                    INFO     2023-08-03 10:10:10:root:INFO - Total Runs To Download: 1                                                                                          fastq_dl.py:635
                    INFO     2023-08-03 10:10:10:root:INFO -         Working on run SRR13086318...                                                                              fastq_dl.py:654
                    DEBUG    2023-08-03 10:10:10:executor.process:DEBUG - Executing external command: bash -c 'prefetch SRR13086318 --max-size 10T -o SRR13086318.sra'         __init__.py:1475
                    DEBUG    2023-08-03 10:10:10:executor.process:DEBUG - Constructing subprocess.Popen object ..                                                              __init__.py:1483
                    DEBUG    2023-08-03 10:10:10:executor.process:DEBUG - Joining synchronous process using subprocess.Popen.communicate() ..                                  __init__.py:1504
2023-08-03 10:10:14 DEBUG    2023-08-03 10:10:14:executor.process:DEBUG - Got return code 0 from synchronous process (bash -c 'prefetch SRR13086318 --max-size 10T -o          __init__.py:1638
                             SRR13086318.sra').
                    DEBUG    2023-08-03 10:10:14:root:DEBUG -                                                                                                                    fastq_dl.py:92
                    DEBUG    2023-08-03 10:10:14:root:DEBUG - 2023-08-03T14:10:10 prefetch.3.0.3: Current preference is set to retrieve SRA Normalized Format files with full    fastq_dl.py:93
                             base quality scores.
                             2023-08-03T14:10:11 prefetch.3.0.3: 1) Downloading 'SRR13086318'...
                             2023-08-03T14:10:11 prefetch.3.0.3: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to
                             current file availability.
                             2023-08-03T14:10:11 prefetch.3.0.3:  Downloading via HTTPS...
                             2023-08-03T14:10:14 prefetch.3.0.3:  HTTPS download succeed
                             2023-08-03T14:10:14 prefetch.3.0.3:  'SRR13086318' is valid
                             2023-08-03T14:10:14 prefetch.3.0.3: 1) 'SRR13086318' was downloaded successfully
                             2023-08-03T14:10:14 prefetch.3.0.3: 'SRR13086318' has 0 unresolved dependencies

                    DEBUG    2023-08-03 10:10:14:executor.process:DEBUG - Executing external command: bash -c 'fasterq-dump SRR13086318 --split-3 --mem 1G --threads 1'        __init__.py:1475
                    DEBUG    2023-08-03 10:10:14:executor.process:DEBUG - Constructing subprocess.Popen object ..                                                              __init__.py:1483
                    DEBUG    2023-08-03 10:10:14:executor.process:DEBUG - Joining synchronous process using subprocess.Popen.communicate() ..                                  __init__.py:1504
2023-08-03 10:10:35 DEBUG    2023-08-03 10:10:35:executor.process:DEBUG - Got return code 0 from synchronous process (bash -c 'fasterq-dump SRR13086318 --split-3 --mem 1G     __init__.py:1638
                             --threads 1').
                    DEBUG    2023-08-03 10:10:35:root:DEBUG -                                                                                                                    fastq_dl.py:92
                    DEBUG    2023-08-03 10:10:35:root:DEBUG - spots read      : 841,910                                                                                          fastq_dl.py:93
                             reads read      : 1,683,820
                             reads written   : 1,683,820

                    DEBUG    2023-08-03 10:10:35:executor.process:DEBUG - Executing external command: bash -c 'pigz --force -p 1 -n SRR13086318*.fastq'                        __init__.py:1475
                    DEBUG    2023-08-03 10:10:35:executor.process:DEBUG - Constructing subprocess.Popen object ..                                                              __init__.py:1483
                    DEBUG    2023-08-03 10:10:35:executor.process:DEBUG - Joining synchronous process using subprocess.Popen.communicate() ..                                  __init__.py:1504
2023-08-03 10:12:39 DEBUG    2023-08-03 10:12:39:executor.process:DEBUG - Got return code 0 from synchronous process (bash -c 'pigz --force -p 1 -n SRR13086318*.fastq').      __init__.py:1638
                    DEBUG    2023-08-03 10:12:39:root:DEBUG -                                                                                                                    fastq_dl.py:92
                    DEBUG    2023-08-03 10:12:39:root:DEBUG -                                                                                                                    fastq_dl.py:93
                    INFO     2023-08-03 10:12:39:root:INFO - Writing metadata to /home/curtis_kapsak/fastq-run-info.tsv
$ zcat SRR13086318_1.fastq.gz |fastq-scan
{
    "qc_stats": {
        "total_bp": 192698955,
        "coverage": 0.00,
        "read_total": 841910,
        "read_min": 100,
        "read_mean": 228.883,
        "read_std": 38.6772,
        "read_median": 250,
        "read_max": 251,
        "read_25th": 222,
        "read_75th": 251,
        "qual_min": 3,
        "qual_mean": 29.9999,
        "qual_std": 0.0416146,
        "qual_max": 30,
        "qual_median": 30,
        "qual_25th": 30,
        "qual_75th": 30
    },
    "read_lengths": {

        "100": 1159,        "101": 1237,        "102": 1143,        "103": 990,        "104": 1329,
        "105": 1231,        "106": 1197,        "107": 1044,        "108": 1226,        "109": 1266,
        "110": 1199,        "111": 1124,        "112": 1253,        "113": 1170,        "114": 1132,
        "115": 1124,        "116": 1056,        "117": 1028,        "118": 1116,        "119": 1113,
        "120": 1139,        "121": 1197,        "122": 1168,        "123": 1148,        "124": 1285,
        "125": 1283,        "126": 1340,        "127": 1340,        "128": 1304,        "129": 1405,
        "130": 1337,        "131": 1402,        "132": 1283,        "133": 1420,        "134": 1419,
        "135": 1292,        "136": 1269,        "137": 1470,        "138": 1316,        "139": 1420,
        "140": 1266,        "141": 1525,        "142": 1364,        "143": 1362,        "144": 1303,
        "145": 1594,        "146": 1476,        "147": 1649,        "148": 1514,        "149": 1557,
        "150": 1462,        "151": 1724,        "152": 1417,        "153": 1816,        "154": 1795,
        "155": 1804,        "156": 1665,        "157": 2004,        "158": 2032,        "159": 2004,
        "160": 1640,        "161": 2193,        "162": 1625,        "163": 1670,        "164": 1613,
        "165": 1594,        "166": 1563,        "167": 1631,        "168": 1621,        "169": 1623,
        "170": 1573,        "171": 1633,        "172": 1654,        "173": 1899,        "174": 1673,
        "175": 1796,        "176": 1953,        "177": 1847,        "178": 1907,        "179": 1890,
        "180": 1864,        "181": 2255,        "182": 1879,        "183": 1819,        "184": 1945,
        "185": 1838,        "186": 1755,        "187": 1846,        "188": 1913,        "189": 1957,
        "190": 1962,        "191": 1870,        "192": 1929,        "193": 2045,        "194": 1976,
        "195": 1955,        "196": 2215,        "197": 2395,        "198": 2043,        "199": 2259,
        "200": 2263,        "201": 2345,        "202": 2163,        "203": 2153,        "204": 2315,
        "205": 2301,        "206": 2100,        "207": 2245,        "208": 2160,        "209": 2326,
        "210": 2242,        "211": 2238,        "212": 2724,        "213": 2575,        "214": 2492,
        "215": 2469,        "216": 2608,        "217": 2425,        "218": 2451,        "219": 2524,
        "220": 2715,        "221": 2799,        "222": 2629,        "223": 2698,        "224": 2714,
        "225": 2648,        "226": 2529,        "227": 2688,        "228": 2517,        "229": 2445,
        "230": 2549,        "231": 2554,        "232": 2485,        "233": 2465,        "234": 2700,
        "235": 2721,        "236": 2836,        "237": 2681,        "238": 3081,        "239": 3061,
        "240": 2965,        "241": 2855,        "242": 3209,        "243": 2864,        "244": 2955,
        "245": 2928,        "246": 3447,        "247": 5498,        "248": 18243,        "249": 39943,
        "250": 246256,        "251": 254088
    },
    "per_base_quality": {
        "1": 29.9999,        "2": 29.9999,        "3": 29.9999,        "4": 29.9999,        "5": 29.9999,
        "6": 29.9999,        "7": 29.9999,        "8": 29.9999,        "9": 29.9999,        "10": 29.9999,
        "11": 29.9999,        "12": 29.9999,        "13": 29.9999,        "14": 29.9999,        "15": 29.9999,
        "16": 29.9999,        "17": 29.9999,        "18": 29.9999,        "19": 29.9999,        "20": 29.9999,
        "21": 29.9999,        "22": 29.9999,        "23": 29.9999,        "24": 29.9999,        "25": 29.9999,
        "26": 29.9999,        "27": 29.9999,        "28": 29.9999,        "29": 29.9999,        "30": 29.9999,
        "31": 29.9999,        "32": 29.9999,        "33": 29.9999,        "34": 29.9999,        "35": 29.9999,
        "36": 29.9999,        "37": 29.9999,        "38": 29.9999,        "39": 29.9999,        "40": 29.9999,
        "41": 29.9999,        "42": 29.9999,        "43": 29.9999,        "44": 29.9999,        "45": 29.9999,
        "46": 29.9999,        "47": 29.9999,        "48": 29.9999,        "49": 29.9999,        "50": 29.9999,
        "51": 29.9999,        "52": 29.9999,        "53": 29.9999,        "54": 29.9999,        "55": 29.9999,
        "56": 29.9999,        "57": 29.9999,        "58": 29.9999,        "59": 29.9999,        "60": 29.9999,
        "61": 29.9999,        "62": 29.9999,        "63": 29.9999,        "64": 29.9999,        "65": 29.9999,
        "66": 29.9999,        "67": 29.9999,        "68": 29.9999,        "69": 29.9999,        "70": 29.9999,
        "71": 29.9999,        "72": 29.9999,        "73": 29.9999,        "74": 29.9999,        "75": 29.9999,
        "76": 29.9999,        "77": 29.9999,        "78": 29.9999,        "79": 29.9999,        "80": 29.9999,
        "81": 29.9999,        "82": 29.9999,        "83": 29.9999,        "84": 29.9999,        "85": 29.9999,
        "86": 29.9999,        "87": 29.9999,        "88": 29.9999,        "89": 29.9999,        "90": 29.9999,
        "91": 29.9999,        "92": 29.9999,        "93": 29.9999,        "94": 29.9999,        "95": 29.9999,
        "96": 29.9999,        "97": 29.9999,        "98": 29.9999,        "99": 29.9999,        "100": 29.9999,
        "101": 29.9999,        "102": 29.9999,        "103": 29.9999,        "104": 29.9999,        "105": 29.9999,
        "106": 29.9999,        "107": 29.9999,        "108": 29.9999,        "109": 29.9999,        "110": 29.9999,
        "111": 29.9999,        "112": 29.9999,        "113": 29.9999,        "114": 29.9999,        "115": 29.9999,
        "116": 29.9999,        "117": 29.9999,        "118": 29.9999,        "119": 29.9999,        "120": 29.9999,
        "121": 29.9999,        "122": 29.9999,        "123": 29.9999,        "124": 29.9999,        "125": 29.9999,
        "126": 29.9999,        "127": 29.9999,        "128": 29.9999,        "129": 29.9999,        "130": 29.9999,
        "131": 29.9999,        "132": 29.9999,        "133": 29.9999,        "134": 29.9999,        "135": 29.9999,
        "136": 29.9999,        "137": 29.9999,        "138": 29.9999,        "139": 29.9999,        "140": 29.9999,
        "141": 29.9999,        "142": 29.9999,        "143": 29.9999,        "144": 29.9999,        "145": 29.9999,
        "146": 29.9999,        "147": 29.9999,        "148": 29.9999,        "149": 29.9999,        "150": 29.9999,
        "151": 29.9999,        "152": 29.9999,        "153": 29.9999,        "154": 29.9999,        "155": 29.9999,
        "156": 29.9999,        "157": 29.9999,        "158": 29.9999,        "159": 29.9999,        "160": 29.9999,
        "161": 29.9999,        "162": 29.9999,        "163": 29.9999,        "164": 29.9999,        "165": 29.9999,
        "166": 29.9999,        "167": 29.9999,        "168": 29.9999,        "169": 29.9999,        "170": 29.9999,
        "171": 29.9999,        "172": 29.9999,        "173": 29.9999,        "174": 29.9999,        "175": 29.9999,
        "176": 29.9999,        "177": 29.9999,        "178": 29.9999,        "179": 29.9999,        "180": 29.9999,
        "181": 29.9999,        "182": 29.9999,        "183": 29.9999,        "184": 29.9999,        "185": 29.9999,
        "186": 29.9999,        "187": 29.9999,        "188": 30,        "189": 30,        "190": 30,
        "191": 30,        "192": 30,        "193": 30,        "194": 30,        "195": 30,
        "196": 30,        "197": 30,        "198": 30,        "199": 30,        "200": 30,
        "201": 30,        "202": 30,        "203": 30,        "204": 30,        "205": 30,
        "206": 30,        "207": 30,        "208": 30,        "209": 30,        "210": 30,
        "211": 30,        "212": 30,        "213": 30,        "214": 30,        "215": 30,
        "216": 30,        "217": 30,        "218": 30,        "219": 30,        "220": 30,
        "221": 30,        "222": 30,        "223": 30,        "224": 30,        "225": 30,
        "226": 30,        "227": 30,        "228": 30,        "229": 30,        "230": 30,
        "231": 30,        "232": 30,        "233": 30,        "234": 30,        "235": 30,
        "236": 30,        "237": 30,        "238": 30,        "239": 30,        "240": 30,
        "241": 30,        "242": 30,        "243": 30,        "244": 30,        "245": 30,
        "246": 30,        "247": 30,        "248": 30,        "249": 30,        "250": 29.9999,
        "251": 29.9999
    }
}
kapsakcj commented 1 year ago

OK, yup I think I just got unlucky with this particular accession: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR13086318&display=metadata

It seems to me that even the original FASTQs hosted on SRA are SRA Lite format. I tried using fastq-dump and fasterq-dump v3.0.6 and still got SRA Lite formatted FASTQs.

rpetit3 commented 1 year ago

It's looking like, given fastq-dl uses sra-tools (specifically prefetch and fasterq-dump), the best I do is to make sure we've done everything we're supposed to do to get the SRA Normalized FASTQs.

Unfortunately, after that, not much can be done about what SRA is serving up. For SRR13086318, it might be worth submitting a ticket to SRA and asking what's happening here.

Haha quite the can of worms that SRA Lite has opened!

kapsakcj commented 1 year ago

agreed! Thank you for digging into this one. I will submit a ticket to the SRA helpdesk and see what they can tell me.

rpetit3 commented 1 year ago

@kapsakcj as a band-aid, I released v2.0.3 which explicitly sets the preference to SRA Normalized by executing vdb-config --simplified-quality-scores no before each SRA download.

I have to restructure things, when I do that I'll add the --provider source option and likely a warning that the FASTQs might be SRA Lite derived if all the scores are Q30 (unless --sra-lite option is used) .

This should at least allow you to move forward and know that we've provided SRA everything expected to get SRA Normalized format.