joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

accessing data for public restroom bacteria tutorial #491

Closed brooksomics closed 8 years ago

brooksomics commented 9 years ago

I'm having trouble accessing data for many of the tutorials including the Study ID 1335, Project Name Flores_restroom_surface_biogeography data used for the public restroom bacteria tutorial. Any time I try to access data from here (ftp://thebeast.colorado.edu/pub/QIIME_DB_Public_Studies/), the server times out. It seems some data has been moved to QIITA, but I can't find the Flores data. Could someone send me the relevant files for the tutorial or point me in that direction?

audy commented 9 years ago

I'm guessing that The Beast is moving with the Knight lab. @eldeveloper or @gregcaporaso might know where to find it.

ElDeveloper commented 9 years ago

You are right, it moved to UCSD, the contents of that server were moved onto this new location: ftp://ftp.microbio.me/pub

On (Jun-19-15|11:21), Austin Richardson wrote:

I'm guessing that The Beast is moving with the Knight lab. @eldeveloper or @gregcaporaso might know where to find it.


Reply to this email directly or view it on GitHub: https://github.com/joey711/phyloseq/issues/491#issuecomment-113597146

audy commented 9 years ago

@ElDeveloper QIIME_DB_Public_Studies is empty on that server.

ElDeveloper commented 9 years ago

The reason for that is that we are no longer maintaining that data, as it was all generated and managed through https://github.com/qiime/qiime_web_app which is no longer under active development. We are now dedicating our development efforts on Qiita: https://github.com/biocore/qiita which will have a REST API from which data can be queried from.

On (Jun-19-15|12:21), Austin Richardson wrote:

@ElDeveloper QIIME_DB_Public_Studies is empty on that server.


Reply to this email directly or view it on GitHub: https://github.com/joey711/phyloseq/issues/491#issuecomment-113612860

brooksomics commented 9 years ago

I tried finding the data on Qiita, but could not. I read here that some data is not currently available for IRB reasons, but I don't think that's an issue in this case...but maybe it is?

Does anyone happen to have the data, or @ElDeveloper, could you upload the project to QIITA so folks trying to go through public restroom bacteria tutorial could do so?

ElDeveloper commented 9 years ago

@brooksomics, thanks for the link. Although it is true that we are moving old datasets into the new system, even if the study was available through http://qiita.microbio.me, you wouldn't be able to directly use it in this tutorial:

Maybe a more immediate solution, would be to update this tutorial to point to a URL that hosts the file(s) it needs to work.

All of that being said, our issue tracker can be found here, so please feel free to make any suggestions!

audy commented 9 years ago

All the biom files generated by Qiita use BIOM 2.x and the HDF5 backend; as far as I know the BIOM R packages do not support HDF5.

Hopefully, this will be resolved once #443 is closed.

We currently do not support downloading a single zip file where all information about a study is collated.

I guess it is time for an R package for accessing the QIITA API. @ElDeveloper is it currently possible to pull data (in biom format) from QIITA without a login/key?

ElDeveloper commented 9 years ago

That would be awesome!

@audy No, that's not currently possible in Qiita.

On (Jun-19-15|21:56), Austin Richardson wrote:

All the biom files generated by Qiita use BIOM 2.x and the HDF5 backend; as far as I know the BIOM R packages do not support HDF5.

Hopefully, this will be resolved once #443 is addressed.

We currently do not support downloading a single zip file where all information about a study is collated.

I guess it is time for an R package for accessing the QIITA API. @ElDeveloper is it currently possible to pull data (in biom format) from QIITA without a login/key?


Reply to this email directly or view it on GitHub: https://github.com/joey711/phyloseq/issues/491#issuecomment-113709677

joey711 commented 9 years ago

Hey @audy @ElDeveloper , A draft of the HDF5-supporting code is already up at https://github.com/joey711/biom

@nosson and I are working on a new version of the biom format package, probably called "biomformat", that will address HDF5 more fully and within Bioconductor, rather than CRAN. Will post that as a new repo soon.

As for Qiita, I'm happy to add a function in phyloseq for pulling data via the REST API, once it exists, and also contributing to a qiita-specific API/R code base. This might more naturally go into the "biomformat" package, rather than a separate R package, depending on the complexity of the Qiita API.

Meanwhile, I'll see what I can do to triage these broken links and down data in the tutorials. I don't know off hand if I saved a copy of the restroom zip file anywhere, unfortunately...

ElDeveloper commented 9 years ago

That sounds awesome!

On (Jun-22-15|12:00), Paul J. McMurdie wrote:

Hey @audy @ElDeveloper , A draft of the HDF5-supporting code is already up at https://github.com/joey711/biom

@nosson and I are working on a new version of the biom format package, probably called "biomformat", that will address HDF5 more fully and within Bioconductor, rather than CRAN. Will post that as a new repo soon.

As for Qiita, I'm happy to add a function in phyloseq for pulling data via the REST API, once it exists, and also contributing to a qiita-specific API/R code base. This might more naturally go into the "biomformat" package, rather than a separate R package, depending on the complexity of the Qiita API.

Meanwhile, I'll see what I can do to triage these broken links and down data in the tutorials. I don't know off hand if I saved a copy of the restroom zip file anywhere, unfortunately...


Reply to this email directly or view it on GitHub: https://github.com/joey711/phyloseq/issues/491#issuecomment-114221932

joey711 commented 9 years ago

Here is the current version of this "BIOM" package

https://github.com/joey711/biomformat

As for Qiita API, I'd love to know write away when it is released.

joey711 commented 9 years ago

I will close this issue once the restroom tutorial has been updated to reflect a working link or otherwise have access to the data.

joey711 commented 9 years ago

@ElDeveloper (Yoshiki!), any update on Qiita API, or a way in which an interested user could access files that were at one point hosted on QIIME DB? The link above (ftp://ftp.microbio.me/pub) does not seem to actually give access to any of the previously public-facing files.

Is there really no way to download the previously-public "restroom biogeography" data?

ElDeveloper commented 8 years ago

Hello @joey711, we don't yet have an API that would allow you to fetch the data programatically. We removed the data that we were previously hosting in the FTP server. As an intermediate solution (while we get an API in place), I think I can dig back that file and host it somewhere in ftp.microbio.me, would that help?

On (Oct-08-15|14:13), Paul J. McMurdie wrote:

@ElDeveloper (Yoshiki!), any update on Qiita API, or a way in which an interested user could access files that were at one point hosted on QIIME DB? The link above (ftp://ftp.microbio.me/pub) does not seem to actually give access to any of the previously public-facing files.

Is there really no way to download the previously-public "restroom biogeography" data?


Reply to this email directly or view it on GitHub: https://github.com/joey711/phyloseq/issues/491#issuecomment-146687483

joey711 commented 8 years ago

That would actually help a lot. The old QIIME-DB "API" was just that FTP structure. I was about to update phyloseq to document that the interface function is now deprecated, but if you re-host that data as an FTP, anyway, then I could simply update the URL in the interface function and it should be back to fully operational. Don't worry, I will also update the doc to say it is a backward-compatibility holdover, and link users to Qiita for more recent datasets.

Sound okay?

ElDeveloper commented 8 years ago

Sounds like a fantastic plan, let me dive into the back-ups we have of that data and I'll post an update as soon as I the restroom data online.

On (Oct-09-15|11:08), Paul J. McMurdie wrote:

That would actually help a lot. The old QIIME-DB "API" was just that API structure. I was about to update phyloseq to document that the interface function is now deprecated, but if you re-host that data as an FTP, anyway, then I could simply update the URL in the interface function and it should be back to fully operational. Don't worry, I will also update the doc to say it is a backward-compatibility holdover, and link users to Qiita for more recent datasets.

Sound okay?


Reply to this email directly or view it on GitHub: https://github.com/joey711/phyloseq/issues/491#issuecomment-146950490

joey711 commented 8 years ago

Great! Thanks!

ElDeveloper commented 8 years ago

@joey711, it took a while but thanks to our sys-admin's magic (:sparkles:), the file is now located in the expected location:

ftp://ftp.microbio.me/pub/restroom-data/study_1335_split_library_seqs_and_mapping.zip

ElDeveloper commented 3 years ago

This is 6 years too late, but Qiita has had a way to pull datasets programatically for a while. You need the dataset to be public and know the study or artifact IDs that you are interested in.

More information on this here.