Closed ewels closed 3 years ago
I have found a dataset that potentially ticks several of these boxes:
Here is the link to GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120794
Here is a list of the files involved:
download.sh
That is certainly a lot of data ๐๐ป
Downloading now, will see how big it is. Might be a little excessive, but I like it ๐
You managed to fill my project partition on the cluster ๐ Trying again now in a different project..
Definitely excessive ๐ This has to be browsable on the website!
Well - sorry - I thought you wanted it excessive... Are you trying an entirely different project, or just a subset?
Yeah good point about the website. Maybe 108 samples is a bit much ๐
Do you think we could pick maybe 3 or 4 conditions each with 2-3 replicates from this bunch? So ~9-12 samples total? Not very excessive after all really. But maybe a bit more useful to browse.
Yup, subset would be great. Maybe we should show you an example if you haven't seen one already. Have a butchers here
I suppose one would have to change the mega in megatest, but how about something like this:
download.sh
2 cell lines as bulk sample, and one sub-condition with 3 replicates each?
Downloading now ๐๐ป
ok, files synced - you can view them here: https://nf-co.re/methylseq/dev/results#methylseq/input_data/
Hopefully that looks about right in terms of file sizes etc! I'll try to put together test_full.config
for the paths when I get a chance.
ok, first run going with the new data: https://tower.nf/watch/PmNwX48oJ17L6
Log in using nf-core-awstests@mailinator.com and get the login email here if you want to see (you can then share the run on to your own account for easier future reference if you want).
Tests now running for the v1.6 release - some issues with the config and run, but the test data all seems to have worked well! I think that we can close this issue.
AWS megatests is now running nicely and weโre trying to set up all (most) nf-core pipelines to run a big dataset. We need to identify a set of public data to run benchmarks for the pipeline.
The idea is that this will run automatically for every release of the nf-core/methylseq pipeline. The results will then be publicly accessible from s3 and viewable through the website: https://nf-co.re/methylseq/results - this means that people can manually compare differences in output between pipeline releases if they wish.
We need a dataset that is as โnormalโ as possible, mouse or human, sequenced relatively recently and with a bunch of replicates etc. It can be a fairly large project
I'm hoping that @FelixKrueger can help here, but suggestions from anyone and everyone are more than welcome! โ๐ป
In practical terms, once decided we need to:
s3://nf-core-awsmegatests/methylseq/input_data/
(I can help with this)test_full.config
to work with these file paths.github/workflows/awsfulltest.yml
(should be no changes required I think?)dev
branch manually