langmead-lab / monorail-external

examples to run monorail externally
MIT License
13 stars 5 forks source link

Importing processed data into recount3 #5

Closed daria-dc closed 3 years ago

daria-dc commented 3 years ago

@ChristopherWilks,

I am opening a new issue for this, although I am not sure if it is better to ask this in the recount3 repo.

The only information I could find about importing my own data is to set the _recount3url argument in the _createrse function of the recount3 package. However, this is not working for me out of the box.

Could you maybe add some detailed instruction/example in your README on how to actually import the pipeline output into recount3?

Thanks!

ChristopherWilks commented 3 years ago

Hi @daria-dc

I've added a small section on loading Unifier output data into recount3: https://github.com/langmead-lab/monorail-external/blob/master/README.md#loading-custom-unifier-runs-into-recount3

Hopefully that points you in the right direction. Feel free to post followup questions here. Though, that said, my time is pretty limited these days as I'm no longer at Hopkins.

daria-dc commented 3 years ago

Hi @ChristopherWilks,

Thanks, the instructions you added are really helpful! I just want to point out that the _recountpred and _recount_seqqc metadata files that are also stored in your example folder

http://snaptron.cs.jhu.edu/data/temp/recount3test/human/data_sources/sra/metadata/42/ERP001942/

are not generated by the pipeline, this can also be seen from the _make_recount3_metadatafiles.sh.run file. And then recount3 complains that the files are missing.

Could you please have a look at this when you find the time?

ChristopherWilks commented 3 years ago

Good catch @daria-dc, while those files were generated for recount3, they were never part of Monorail proper (i.e. not an automated part). However, while I need to update the README, you should be able to still successfully load your custom data into recount3 even when it complaints about those two files (they should only be warnings).

If you find they're not warnings but errors, I'd suggest installing the latest version of the recount3 package direct from github to turn them into warnings: in R: remotes::install_github("LieberInstitute/recount3")

daria-dc commented 3 years ago

Thanks @ChristopherWilks,

this resolved my issue.

This file

http://snaptron.cs.jhu.edu/data/temp/recount3test/human/data_sources/sra/metadata/sra.recount_project.MD.gz

is also not generated by the pipeline, and when it is missing, recount3 throws an error for the _availableprojects() function:

Error in file(file, "rt") : invalid 'description' argument

I just copied the sra.recount_project.ERP001942.MD.gz file from the subfolder and renamed it and then the import worked.

So I will close this issue now!

ChristopherWilks commented 3 years ago

thanks for the update @daria-dc, another good catch. I'll update the documentation.

The sra.recount_project.MD.gz file is cross-study for a given data_source so it falls outside of the running of any given study. If you run another study through monorail (assuming the same datasource, e.g. "internal" or something) you'd want to append the new runs/samples to sra.recount_project.MD.gz rather than overwrite it as it's the main list of all runs/studies in the data source.

lixin4306ren commented 2 years ago

Hi, @ChristopherWilks

I used monorail pipeline to process my own data and have successfully created the rse file. However, without those meta files, such as recount_seq_qc and recount_pred, I can't use transform_counts for downstream analysis. I'm wondering how I can generate these files as you showed in the example dataset. Thanks a lot.

Xin

ChristopherWilks commented 2 years ago

@lixin4306ren the recount_seq_qc and recount_pred files are not required, you should just get a warning about them being missing when you load into recount3, make sure you're on the latest version of recount3 for that to work though.

You only need your project's version of the 3 files generated by your run of the recount-unify, similar to the example study's 3 files (though named according to your study):

http://snaptron.cs.jhu.edu/data/temp/recount3test/human/data_sources/sra/metadata/42/ERP001942/

Also, as is noted above and in the README, for a single study you need to copy your study's version of the sra.recount_project..MD.gz file to top of the metadata root and rename it sra.recount_project.gz, e.g. in the example above: http://snaptron.cs.jhu.edu/data/temp/recount3test/human/data_sources/sra/metadata/sra.recount_project.gz