grp-bork / spire_contribute

3 stars 0 forks source link

Download problems #8

Closed akques closed 7 months ago

akques commented 8 months ago

Hello! Thank you for your work. I've encountered a couple of issues while use SPIRE to explore data from taxonomy and would appreciate some help.

  1. Firstly, ,I'd like to know if there's a way to download data for the entire genus at once, instead of having to download data for each SPIRE ID individually. The current process can be cumbersome when dealing with a large number of samples.
  2. Secondly, Error with download links: When attempting to download data using the "Download MAG" or "Download Assemble..." links, I encounter an "Internal Server Error." I'm uncertain if this is a network issue or another reason.

I hope to receive some guidance on resolving these issues. Thank you!

akques commented 8 months ago

I have downloaded spire_v1_genome_metadata.tsv,by using like wget http://spire.embl.de/download_file/${genome_id},but I also want to know about these document meaning ?

1.Cluster Metadata: include rep both pg3 and SPIRE? 2.Genome Metadata: include only SPIRE MAGs spire_id

So,if I want to download all sequence from both pg3 and SPIRE ,I have to sepreate download MAGs inSPIRE , isolates and MAGs in pg3?

akques commented 8 months ago

I also want to ask about the Microntology ,I downloaded it and found it not include all data ,and it has many repetitive data ,I wonder why ? image

akques commented 7 months ago

I am seeking an easier way to download the entire database. In your paper, you mentioned, 'SPIRE encompasses 99,146 metagenomic samples from 739 studies, covering a wide array of microbial environments, and is augmented with manually curated contextual data.' However, on the website, I only found information about 719 studies. Could you please explain this discrepancy? The website I am referring to is http://spire.embl.de/environment." I have attached the genome ID,and download with wget, but I would greatly appreciate a list of all MD5 checksums to verify my downloads.

fullama commented 7 months ago

Hello! Thank you for your work. I've encountered a couple of issues while use SPIRE to explore data from taxonomy and would appreciate some help.

  1. Firstly, ,I'd like to know if there's a way to download data for the entire genus at once, instead of having to download data for each SPIRE ID individually. The current process can be cumbersome when dealing with a large number of samples.
  2. Secondly, Error with download links: When attempting to download data using the "Download MAG" or "Download Assemble..." links, I encounter an "Internal Server Error." I'm uncertain if this is a network issue or another reason.

I hope to receive some guidance on resolving these issues. Thank you!

  1. Unfortunately not right now.. its a technical limitation but we have plans to overcome this.
  2. This may have bee bad timing as our filesystem underwent an upgrade, if you have any specific examples of links that are consistently broken d o let me know and i can investigate.
fullama commented 7 months ago

I have downloaded spire_v1_genome_metadata.tsv,by using like wget http://spire.embl.de/download_file/${genome_id},but I also want to know about these document meaning ?

1.Cluster Metadata: include rep both pg3 and SPIRE? 2.Genome Metadata: include only SPIRE MAGs spire_id

So,if I want to download all sequence from both pg3 and SPIRE ,I have to sepreate download MAGs inSPIRE , isolates and MAGs in pg3?

Yes

fullama commented 7 months ago

I also want to ask about the Microntology ,I downloaded it and found it not include all data ,and it has many repetitive data ,I wonder why ? image

this was fixed here: https://github.com/grp-bork/spire_contribute/issues/9

fullama commented 7 months ago

I am seeking an easier way to download the entire database. In your paper, you mentioned, 'SPIRE encompasses 99,146 metagenomic samples from 739 studies, covering a wide array of microbial environments, and is augmented with manually curated contextual data.' However, on the website, I only found information about 719 studies. Could you please explain this discrepancy? The website I am referring to is http://spire.embl.de/environment." I have attached the genome ID,and download with wget, but I would greatly appreciate a list of all MD5 checksums to verify my downloads.

I took some studies down temporarily as some files seemed incomplete.. i will put these back when i have verified all is correct

our plans for making the data easier to download should also come with md5s.. but i willl lookinto another solution that i could maybe provide in the meantime.