sheynkman-lab / Long-Read-Proteogenomics

A workflow for enhanced protein isoform detection through integration of long-read RNA-seq and mass spectrometry-based proteomics.
MIT License
38 stars 16 forks source link

Command run for MM #94

Closed gsheynkman closed 3 years ago

gsheynkman commented 3 years ago

@rmillikin Can you please share with me and @bj8th the command (or command template) to run MM?

Thanks so much,

Gloria

rmillikin commented 3 years ago

pull the docker image with: docker pull smithchemwisc/metamorpheus:lrproteogenomics

and then generate the SearchTask.toml (changing the mount directory as needed): docker run --rm -v C:/Users/rmillikin/Desktop/LRPG:/mnt/data smithchemwisc/metamorpheus:lrproteogenomics -g -o /mnt/data

The .toml is a plain text file and has a field called DoParsimony = true which can be set to false if you want to skip protein inference. There is another field called UseOrfCallingInfoInProteinInference = true which can be set to false if you want to not use the CPM weights from the ORF table in protein inference.

then run MM with: docker run --rm -v C:/Users/rmillikin/Desktop/LRPG:/mnt/data smithchemwisc/metamorpheus:lrproteogenomics -d /mnt/data/pacbio_clusters.fasta -s /mnt/data/jurkat_mass_spec_frac16.raw --orf /mnt/data/pacbio_clusters.tsv -t /mnt/data/SearchTask.toml -v minimal

The -d, -s, --orf, and -t flags are used to pass in protein databases, spectra files, ORF calling tables, and .toml files respectively. You can add multiple files with space delimiting, e.g. -d C:/MyFile1.fasta C:/MyFile2.fasta -s ...