Closed kpj closed 1 year ago
Thanks for the bug report @kpj! I think the reason this bug results in two runs is because that happens when you also search it via the NCBI-SRA website. For example see: https://www.ncbi.nlm.nih.gov/sra/?term=SRR12169246 That said, it can be handled internally - I will get to it this week.
Thanks! I came across a similar issue when fetching metadata manually and ended up subsetting the dataframe.
Maybe there's a better of way of handling this.
For now, I would recommend the fix you have in place. It is slightly tricky to deal this internally given the passed in argument could be anything (SRP/SRR/SRX/GSM etc.). The origin of this is not at pysradb end, but what NCBI search itself returns (see above comment)
Is the main issue to figure out which column to detect duplicates in/which column to select the accessions from?
In that case it might be an idea to add a parameter such as duplicate_accession_removal_column
which would be run_accession
when input accessions are of the form ERR4413803
.
This is certainly not very elegant and maybe there are other issues making this more difficult, so I am happy either way :)
I met the same question. And I am confused about the relationship between multiple SRR IDs within a single SRX ID. Are these SRR IDs technical replicates from a shared sequencing library? The manual in NCBI made me really confused. And I would appreciate it if you could tell me your understanding of this question.
Yes, SRRs for the same SRX are technical replicates. Here are some slides that might help: https://f1000research.com/slides/8-1183
Yes, SRRs for the same SRX are technical replicates. Here are some slides that might help: https://f1000research.com/slides/8-1183
Many thanks for your quick reply!!
In passing, I would like to raise here another problem that I encountered in the course of using. The metadata I prefetch by pysradb metadata --detailed
do not include some important info.
For example, I want to acquire antibody info of a ChIPseq ([SRX027872](https://www.ncbi.nlm.nih.gov/sra/SRX027872%5Baccn%5D)). On the web of NCBI, I can see the antibody info (Experiment attributes
part). But there is no related info in metadata I prefetch by pysradb
.
@sheep-liu thanks for brining it to my attention. I have pushed https://github.com/saketkc/pysradb/commit/7da562f86fe759f737b25f6581a8c44a9437b5b4 which enables fetching experiment protocol. It will be in the next release (you can install the develop version from github for now).
For future, please create a new issue. I will close this for now as I think the original issue it is best handled downstream.
@sheep-liu thanks for brining it to my attention. I have pushed 7da562f which enables fetching experiment protocol. It will be in the next release (you can install the develop version from github for now).
For future, please create a new issue. I will close this for now as I think the original issue it is best handled downstream.
Roger! And thanks a lot.
Describe the bug In some cases, when using
SRAweb.sra_metadata
with a single run accession, multiple metadata rows are returned. It would seem more sensible to only return the metadata for the requested run accession. This is e.g. problematic when retrieving metadata for a list of samples and expecting the number of rows to be equal to the number of queried samples.To Reproduce Execute the following code:
Desktop:
Linux
3.8.5
0.11.2-dev0