saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

allow namespace #65

Closed Maarten-vd-Sande closed 4 years ago

Maarten-vd-Sande commented 4 years ago

See issue #64. I have practically zero experience with xml parsing, however the xmlns specifies the xml namespace for the document. Simply setting process_namespace to True seems to resolve the issue for me: https://github.com/martinblech/xmltodict#namespace-support

import pysradb
db = pysradb.SRAweb()
df = db.sra_metadata(["GSM1013144", "GSM2520660"], detailed=True)
print(df)
print(df.library_layout)
  run_accession study_accession experiment_accession  ... ena_fastq_ftp                                    ena_fastq_ftp_1                                    ena_fastq_ftp_2
0    SRR5310913       SRP101305           SRX2610883  ...           N/A  era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR531/...  era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR531/...
1     SRR578686       SRP000941            SRX190781  ...           N/A  era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR578/...                                                   

[2 rows x 46 columns]
0    PAIRED
1    SINGLE
Name: library_layout, dtype: object
saketkc commented 4 years ago

Unrelated failures (they should be fixed sometime in the future)

saketkc commented 4 years ago

Thanks a lot @Maarten-vd-Sande! Appreciate all your contributions so far!