mskcc / tempo

CCS research pipeline to process WES and WGS TN pairs
https://cmotempo.netlify.com/
12 stars 5 forks source link

Merged netMHCpan output #503

Closed kpjonsson closed 5 years ago

kpjonsson commented 5 years ago

Based on /juno/work/taylorlab/biederstedte/sandbox/mergeFacetsOutput24July2019/output/somatic/merged.netmhcpan_netmhc_combined.output.txt: This merged file has no sample name column, so it makes little sense.

Additionally, I know this was requested as one of the aggregated outputs. However, I reckon it will be rare that end users ever look at this (since a summarized form of this ends up in the final MAF), so my recommendation is to not aggregate this and keep it at the per-sample level.

@cband @md09: Feel free to chime in.

cband commented 5 years ago

It could be left out. Or, alternately, the neoantigen-pipeline could be tweaked to add a sample_id column to the output.

I can see some scenarios where the end users might want to look at this file. For example, only the HLA allele that is the best binder is included in the MAF. But they may want to know what the binding affinities for the other alleles are (assuming that the best binder in the MAF had an LOH), and many scenarios like that.

In my opinion, the simplest solution for you guys would be to add a Sample ID column to the pipeline. I am happy to do it.

evanbiederstedt commented 5 years ago

In my opinion, the simplest solution for you guys would be to add a Sample ID column to the pipeline. I am happy to do it.

Yes, I think it would be easiest to put this in the Python scripts.

https://github.com/mskcc/vaporware/blob/master/pipeline.nf#L1588-L1594

That in the sampleID and make a column for that. I'm happy to help out too. CC @cband

That's the best solution?

Otherwise, I could add this column quickly with AWK

evanbiederstedt commented 5 years ago

https://github.com/mskcc/vaporware/pull/511