waldronlab / BugSigDBExports

BugSigDB data files
1 stars 3 forks source link

BugSigDB 1.0 release #4

Closed lgeistlinger closed 2 years ago

lgeistlinger commented 3 years ago

Hi @jwokaty:

@lwaldron and I had started to discuss a release scheme for BugSigDB.

One idea was to follow Bioconductor's semi-annual release scheme, and have a stable release of BugSigDB signatures every half a year. We also discussed zenodo as the platform for hosting the stable release (= csv files for studies, experiments, and signatures).

A stable release is supposed to contain all reviewed content from BugSigDB up to a defined freeze date. For the BugSigDB 1.0 release this could encompass reviewed content up to the present date for simplicity, or if we wanted to synchronize with Bioconductor, up to the past 3.13 release date.

Would you like to go ahead and export the content, filter by date and review status, upload to zenodo, and include the stable release link under https://bugsigdb.org/Help:Export ?

Thanks!

lgeistlinger commented 3 years ago

Note, functionality from our bugsigdbr package will likely be helpful here, including bugsigdbr::importBugSigDB for pulling the data frame that can then be filtered along the usual lines, bugsigdbr::getSignatures to extract the signatures of the filtered df, and bugsigdbr::writeGMT to write a GMT file containing the signatures.

lwaldron commented 3 years ago

I added a new organization secret ZENODO_DEPOSIT with an access token that should allow depositing through the Zenodo API from GitHub Actions in this repo, bugsigdbr, and bugphyzz. Zenodo also has an option for pulling releases directly from any release of a GitHub repo, although that would require either depositing a software + data repo, or creating separate data-only repos. Would be great if you can look into these options a bit @jwokaty then we can discuss the best way to do periodic data releases for these projects.

lgeistlinger commented 3 years ago

Hi @jwokaty @lwaldron : I thought I quickly check in with you as to what's the status of this and whether input/help from my side is required here?

jwokaty commented 3 years ago

@lgeistlinger I'd like to better understand what the release looks like. Will it be 3 csv files: one each for studies, experiments, and signatures? Or will there be different versions in content or file formats?

There's also a place where I can add metadata. I'll try to write it using the website and the repository, but maybe I need some help with the following:

Are there any specific keywords should we associate with the data? This isn't required, but may help users find it.

lgeistlinger commented 3 years ago

Thanks for checking in @jwokaty.

Will it be 3 csv files: one each for studies, experiments, and signatures?

That's the core, yes. This is also what I think what bugsigdbr will pull at one point from zenodo. My thought would be that

bugsigdbr::importBugSigDB

will get a version argument so once the 1.0 release is out on zenodo, users could pull the stable release via:

bugsigdbr::importBugSigDB(version = "1.0")

Currently it directly pulls the bloody edge including unreviewed content from https://bugsigdb.org/Help:Export, which would then become available via:

bugsigdbr::importBugSigDB(version = "devel")

When preparing the three csv files for the 1.0 release, it's important to restrict to reviewed contents only. Let me know whether you have questions or whether it's easier if I provide these files.

In addition, I think we want to supplement the core release with GMT files containing the actual signatures. There are many ways to extract signatures based on taxonomic considerations and bugsigdbr::getSignatures implements a bunch of those options. For non R-users, we however want to provide at least 3 GMT files I think:

@lwaldron let us know if you have thoughts on this.

Are there any specific keywords should we associate with the data? This isn't required, but may help users find it.

A couple that come to mind:

Is there a license associated with this data? Zenodo lists the following licenses (but I think we can do something else):

@lwaldron might have opinions. I'd be good with Creative Commons Attribution 4.0 International.

lwaldron commented 3 years ago

@lgeistlinger would you write a simple .R script that dumps all the required files? I guess it'll be a few: gmt, cab, ncbi ID, names, genus, species, mixed. No need to put versioning or dates in file names, but maybe a comment line in the first line with the date, license, and reference to bugsigdb.org?

lgeistlinger commented 3 years ago

Sure.

lgeistlinger commented 3 years ago

That is basically done and part of bugsigdbr now:

https://github.com/waldronlab/bugsigdbr/blob/main/inst/scripts/dump_release.R

Call the script via: Rscript dump_release.R <version> <output.directory>

which will produce the following output files:

full_dump.tab 
bugsigdb_signatures_mixed_metaphlan.gmt
bugsigdb_signatures_mixed_ncbi.gmt
bugsigdb_signatures_mixed_taxname.gmt
bugsigdb_signatures_genus_metaphlan.gmt     
bugsigdb_signatures_genus_metaphlan_exact.gmt   
bugsigdb_signatures_genus_ncbi.gmt      
bugsigdb_signatures_genus_ncbi_exact.gmt    
bugsigdb_signatures_genus_taxname.gmt       
bugsigdb_signatures_genus_taxname_exact.gmt
bugsigdb_signatures_species_metaphlan.gmt
bugsigdb_signatures_species_metaphlan_exact.gmt
bugsigdb_signatures_species_ncbi.gmt
bugsigdb_signatures_species_ncbi_exact.gmt
bugsigdb_signatures_species_taxname.gmt
bugsigdb_signatures_species_taxname_exact.gmt

Pending waldronlab/BugSigDB#92 a filter by review status will be incorporated.

jwokaty commented 3 years ago

@lgeistlinger I am working on this at https://github.com/jwokaty/BugSigDBExports, which I will transfer to waldronlab when it's in a good state. I've created a GitHub action that we can run manually to generate exports with dump_release.R.

@lwaldron and I had discussed setting up the action to do a daily export that would be committed to BugSigDBExports. We would do a manual release to get into Zenodo.

However, the version number that is passed to dump_release is just a label, right? I thought if possible it might be better to associate the files with a date corresponding to reviewed content rather than a version number. When we do a release to Zenodo, there will still be a version number, but I think we want to be able to reproduce the files whether we get it from BugSigDBExports, Zenodo, or bugsigdbr.

lgeistlinger commented 3 years ago

I think that is great.

However, the version number that is passed to dump_release is just a label, right?

This is correct. The only place where the version argument is used in dump_release.R is the header line of the output files, eg here:

# BugSigDB 0.0.1, License: Creative Commons Attribution 4.0 International, URL: https://bugsigdb.org

If you call the script with a slight notational abuse and provide a date instead of a version number to dump_release.R, the header line will accordingly change to eg:

# BugSigDB 2021-07-27, License: Creative Commons Attribution 4.0 International, URL: https://bugsigdb.org

and the argument can then also be renamed to date in the script.

lwaldron commented 3 years ago

It looks great! Nice use of the .zenodo.json file too.

lwaldron commented 3 years ago

And BTW I think the Zenodo releases can just have a date within too, releases will have a version number from the tag but the files contain dates as usual.

jwokaty commented 3 years ago

We are still waiting on waldronlab/BugSigDB#92 to close before we publish because this may change the content, correct? I've transferred the BugSigDBExports to waldronlab and have scheduled it to do weekly exports on Sunday using dates. The first will be this Sunday. Does this issue really belong to the new repository? I can transfer it.

lwaldron commented 3 years ago

Yes, this issue does belong in https://github.com/waldronlab/BugSigDBExports.

lwaldron commented 3 years ago

Sorry for being impatient and just transferring it, was only so I could refer to it from https://github.com/waldronlab/BugSigDB/issues/92 (although I just learned that those references get automatically updated with the transfer!)

lwaldron commented 3 years ago

I set up the Zenodo integration and did a little debugging trying to get it to work (https://zenodo.org/account/settings/github/repository/waldronlab/BugSigDBExports#) but am stuck now with the following error on Zenodo:

{
    "errors": "Something went wrong when we tried to publish your release. If your release has not been published within the next hour, please contact us via our support form to resolve this issue."
}

Let's see if it fixes itself within the next hour, otherwise I'll contact the Zenodo support team. It's very particular about the .zenodo.json file.

lgeistlinger commented 3 years ago

@jwokaty @lwaldron quick update: I incorporated a filter for complete content in the dump release script, so the only thing remaining for the 1.0 release to zenodo is to clean up the ontology columns (https://github.com/waldronlab/BugSigDB/issues/92#issuecomment-916309509). Will work on that so that we get that through the door prior to the October Bioc release.

jwokaty commented 3 years ago

Hi @lgeistlinger, there's an issue with automatically releasing to Zenodo, so we should do a manual release of the files. (The last I heard from them, they were still working on it last week.) What is the date that we will do the release, Oct. 25? And do want to modify https://github.com/waldronlab/bugsigdbr to get data from Zenodo (along with the bleeding edge)?

lgeistlinger commented 3 years ago

Thanks @jwokaty. It's a great point. @lwaldron any chance you've heard from Ike with regard to progress on this? Looks like the Oct 25 deadline is a bit in danger, although it would still be great to make it :-)

lgeistlinger commented 3 years ago

And do want to modify https://github.com/waldronlab/bugsigdbr to get data from Zenodo (along with the bleeding edge)?

And yes, that is what I think we are aiming for. Being able to pull the zenodo release (stable) as well as the continously updated version (bleeding edge) from BugSigDBExports as we do it currently, if that makes sense.

jwokaty commented 2 years ago

Closed by #12. I manually did the release on Zenodo at https://zenodo.org/record/5606166 since the automatic mechanism still isn't working. (I should have removed the README, but I you can't change the files after publishing.)

lgeistlinger commented 2 years ago

Thanks @jwokaty. That is great. I think we might have jumped the gun here a little bit though as the first release is still waiting on the fix of the ontology columns in the export. This needs to be fixed by Ike first before we can go ahead and do our first official release.

jwokaty commented 2 years ago

Thanks for clarifying.

lgeistlinger commented 2 years ago

Hi @jwokaty @lwaldron : this is finally ready for release! we finished the ontology columns in the export and everything is looking good now for upload of the stable BugSigDB 1.0 release to zenodo. @jwokaty can you go ahead and perform the upload to zenodo? (Not sure whether this will involve overwriting your previous upload under https://zenodo.org/record/5606166, or whether we bump this to 1.0.1 then). Thanks!

lgeistlinger commented 2 years ago

The release should be accordingly based on the latest export: 1137470

jwokaty commented 2 years ago

@lgeistlinger We have to bump the version to 1.0.1. I just want to check if there should be a specific release title and any description for the release before I create the release. Also, would you like me to 'draft' the upload in Zenodo so that you can take a look before I finalize everything?

lgeistlinger commented 2 years ago

Thanks @jwokaty! I noticed a small inconvenience in the bulk export from bugsigdb.org, with some conditions / body sites being present in upper case and lower case (eg "Feces" and "feces"). I introduced a small fix for that in 121c571. Can you trigger a manual export for that and base the 1.0.1 on this export?

if there should be a specific release title and any description for the release before I create the release

Nothing specific here from my side.

Also, would you like me to 'draft' the upload in Zenodo so that you can take a look before I finalize everything?

That sounds like a good idea!

lgeistlinger commented 2 years ago

Hi @jwokaty - I just took a look at yesterday's export (3def7c9) and everything looks good for upload / release to zenodo. Just let me know if you have any questions. Many thanks!

jwokaty commented 2 years ago

@lgeistlinger I've drafted the new version at https://zenodo.org/deposit/5819260. (I am assuming that you can see it.)

lgeistlinger commented 2 years ago

Thanks @jwokaty , logging in to zenodo via Github (lgeistlinger / ludwig.geistlinger@gmail.com), I am seeing:

Permission required: You do not have sufficient permissions to view this page.

when trying to access the link you provided.

jwokaty commented 2 years ago

I apologize. I thought maybe because we had access to the same thing that maybe we could all see the draft. Maybe there's no way for you to see it? I just updated all the files, except for the README, and then updated the version number to v1.0.1.

lgeistlinger commented 2 years ago

I just updated all the files, except for the README, and then updated the version number to v1.0.1.

Cool! Can you maybe share the files via google drive or dropbox with me to quickly review them on my end. Thanks!

jwokaty commented 2 years ago

The files are the same as in this release: https://github.com/waldronlab/BugSigDBExports/releases/tag/v1.0.1

lgeistlinger commented 2 years ago

Ah very nice, somehow I didn't notice the releases folder/branch. Cool, I'd say good to go to upload to zenodo and closing this issue.