Closed dweemx closed 4 years ago
Thanks for the bug report @dweemx. From a first look, I can confirm this is indeed a bug. I will revert with a possible solution/explanation shortly.
HI @dweemx, It looks like the origin of this bug is at the NCBI's search interface. Looking up SRP125768
on https://www.ncbi.nlm.nih.gov/sra only shows up 128 hits while the total hits clearly should be 136 (corresponding to the total runs). These are the missing run ids:
'SRR6327103', 'SRR6327106', 'SRR6327114', 'SRR6327120', 'SRR6327118', 'SRR6327122', 'SRR6327135', 'SRR6327116'
I will have to look for a way to ensure such runs are not missed. Thanks once again for reporting this.
Hi, I contacted the SRA team and they told me that there was an issue with the SRA file pairing system when the data was ported from GEO to SRA database. This issue should be fixed now.
However, some samples are still missing when I'm using SRAweb
: 'SRR6327106', 'SRR6327114', 'SRR6327120', 'SRR6327118', 'SRR6327122', 'SRR6327116'
Thanks for the update @dweemx. It seems https://www.ncbi.nlm.nih.gov/sra/?term=SRP125768 still sends only 128 results. I will have time to work on a way to fix this in the coming few weeks. Thanks for your patience and sorry for the trouble this has been causing you.
Hi @dweemx Thanks for your patience. I was finally able to fix this in v0.9.9. See this notebook for example with this ID: https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR
The web mode's default --detailed
output gives all the metadata you see on SRA's run table.
> pysradb metadata SRP125768 --detailed | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases run_alias experiment_alias source_name age genotype/variation tissue genotype
SRP125768 SRX4084637 GSM3142622: w1118_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142622: w1118_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301695 NextSeq 500 3552575 79516196 SRR7166639 3552575 176271295 GSM3142622_r1 GSM3142622 w1118_1d_WholeBrain_Unstranded_RNA-seq 1 Day W[1118] brain NaN
SRP125768 SRX4084636 GSM3142621: w1118_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142621: w1118_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301693 NextSeq 500 4513696 100655283 SRR7166638 4513696 220693988 GSM3142621_r1 GSM3142621 w1118_1d_WholeBrain_Stranded_RNA-seq 1 Day W[1118] brain NaN
SRP125768 SRX4084635 GSM3142620: DGRP-551_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142620: DGRP-551_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301694 NextSeq 500 19374029 433332434 SRR7166637 19374029 961111968 GSM3142620_r1 GSM3142620 DGRP-551_1d_WholeBrain_Unstranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084634 GSM3142619: DGRP-551_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142619: DGRP-551_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301692 NextSeq 500 2936449 65552609 SRR7166636 2936449 145074237 GSM3142619_r1 GSM3142619 DGRP-551_1d_WholeBrain_Stranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084633 GSM3142618: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep2_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142618: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep2_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301691 NextSeq 500 24342212 458751469 SRR7166635 24342212 1207043823 GSM3142618_r1 GSM3142618 DGRP-551_1d_WholeBrainNuclei_Unstranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084632 GSM3142617: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep1_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142617: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep1_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301696 Illumina HiSeq 4000 7398351 236600904 SRR7166634 7398351 551705108 GSM3142617_r1 GSM3142617 DGRP-551_1d_WholeBrainNuclei_Unstranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084631 GSM3142616: Adapted_SMART_seq2_R23E10_Cell_9; Drosophila melanogaster; RNA-Seq GSM3142616: Adapted_SMART_seq2_R23E10_Cell_9; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301688 NextSeq 500 267487 6409898 SRR7166633 267487 13266487 GSM3142616_r1 GSM3142616 Adapted_SMART_seq2_R23E10_Cell 0-7 Days R23E10-Gal4 x UAS-CD8::GFP brain NaN
SRP125768 SRX4084630 GSM3142615: Adapted_SMART_seq2_R23E10_Cell_8; Drosophila melanogaster; RNA-Seq GSM3142615: Adapted_SMART_seq2_R23E10_Cell_8; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301690 NextSeq 500 192550 4678011 SRR7166632 192550 9548043 GSM3142615_r1 GSM3142615 Adapted_SMART_seq2_R23E10_Cell 0-7 Days R23E10-Gal4 x UAS-CD8::GFP brain NaN
SRP125768 SRX4084629 GSM3142614: Adapted_SMART_seq2_R23E10_Cell_7; Drosophila melanogaster; RNA-Seq GSM3142614: Adapted_SMART_seq2_R23E10_Cell_7; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301689 NextSeq 500 199223 4833365 SRR7166631 199223 9885888 GSM3142614_r1 GSM3142614 Adapted_SMART_seq2_R23E10_Cell 0-7 Days R23E10-Gal4 x UAS-CD8::GFP brain NaN
Please let me know if you run into any issues.
Hi,
First I'd like to thank you for this very useful package. Unfortunely, I'd love to use
SRAweb
, unfortunately, there seems to be somthing wrong with it compared toSRAdb
.Here are my specs,
Description
I'm trying to get the metadata from a SRA project ID (e.g.: SRP125768).
What I Did
With local SQL db,
W/o local SQL db,
I haven't check all the entries but there is definitely something wrong with df2: duplicated rows / missing rows.
I'd be happy to get your feedback and your fix for this :)