vrmarcelino / CCMetagen

Microbiome classification pipeline
GNU General Public License v3.0
64 stars 19 forks source link

Prebuilt indexes unavailable #60

Open MortenEneberg opened 10 months ago

MortenEneberg commented 10 months ago

Dear developers!

I am looking to test the KMA+CCMetagen pipeline for our metagenomic data, and would like to run some tests with the prebuilt indexes before I invest the time to build custom ones. However, it seems that they are no longer available via the link provided in the readme? I am led to a site where it says 'CloudStor has been decommissioned' after pressing the 'Access the full data' button. Is it available anywhere else?

Kind regards, Morten

vinisalazar commented 10 months ago

Dear @MortenEneberg,

Thank you for reporting this. We are aware of this problem and are currently generating an updated version of the prebuilt indexes, which should be released soon. I will update this issue when they are available.

Best, Vini

igsbma commented 10 months ago

Hi Vini, do you have an expected time on when this database will be ready? thanks!

vrmarcelino commented 10 months ago

Hi @igsbma,

It is difficult to estimate (will depend on our institution providing suitable long-term storage solutions for example). But for the interim, I have enquired about ways of making the original database available again and I am hoping to get a reply within this week.

Apologies for the inconvenience.

igsbma commented 9 months ago

Thank you for your updates. Original database to start from. Looking forward to it :)

vrmarcelino commented 9 months ago

Hi @igsbma and @MortenEneberg,

Here are the links to the original databases. Assuming you have curl, the NCBI index database (without env sequences) can be downloaded with:

curl "https://mediaflux.researchsoftware.unimelb.edu.au:443/mflux/share.mfjp?_token=DGaqucdYelmUiafheIc511282424042&browser=true&filename=ncbi_nt_no_env_11jun2019.zip" -d browser=false -o ncbi_nt_no_env_11jun2019.zip

And the RefSeq:

curl "https://mediaflux.researchsoftware.unimelb.edu.au:443/mflux/share.mfjp?_token=SpQnhaAaoOiznLV47Yw111282424044&browser=true&filename=RefSeq_bf.zip" -d browser=false -o RefSeq_bf.zip

These links will expire in 30 days (by then I believe we can provide the updated reference).

You can also use wget (but need to add an browser=false argument). Let me know if there are any issues.

vinisalazar commented 9 months ago

Hello @igsbma, and thank you @vrmarcelino for the updates.

Just to reiterate what @vrmarcelino said, we are working on how to handle the large size of present NCBI databases, and liaising with our digital services team to find a long-term data storage solution. We expect to provide an official database release by the end of January at the latest.

Best, Vini

igsbma commented 9 months ago

Thank you very much, it runs OK using the temporary solution. Appreciate your help!

humbleflowers commented 8 months ago

Any update on this? The temporary urls dont work now. @vinisalazar

vrmarcelino commented 8 months ago

Hi @humbleflowers Yes, we have a newer version of the database which will be shared very soon.

The original databases can be found here: https://melbourne.figshare.com/projects/CCMetagen_databases/195755

vrmarcelino commented 8 months ago

Hi all,

If you want a beta version of the latest NCBI nt database, you can find it here:

curl "https://mediaflux.researchsoftware.unimelb.edu.au/mflux/share.mfjp?_token=Nuq13YhFxEDbMaQkiSBj1128252375&browser=true&filename=ccmetagen_nt_filtered_db_v2_Mar_2024.tar.gz" -d browser=false -o ccmetagen_nt_filtered_db_v2_Mar_2024.tar.gz