Open js2dark opened 7 years ago
Hi @js2dark - This is a great use case for cohorts; we are happy to help.
We have some worked examples of how to combine VCFs & clinical data into a cohort object. For example, an example using TCGA data, with some explanatory text, which references an earlier example for creating a cohort with clinical data only.
In all of these examples, the basic approach is the same: you loop over the units in your cohort (ie patients), creating a Patient object for each one. You then pass this list of Patients to create the Cohort object.
I will say, neither of the examples above includes the use of BAMs; to include these you will want to (when creating a Patient
), also create Sample
s for each of your samples (tumor &/or normal). Then these Sample
objects get included when creating the Patient
.
For example:
normal_sample = Sample(
is_tumor=False,
bam_path_dna=bam_path_dna_normal)
tumor_sample = Sample(
is_tumor=True,
bam_path_dna=bam_path_dna_tumor,
bam_path_rna=bam_path_rna_tumor,
kallisto_path=kallisto_path,
cufflinks_path=cufflinks_path)
These are then passed to the Patient
object when it is instantiated:
patient = Patient(id=patient_id,
benefit=row["is_benefit"],
os=row["OS in days"],
pfs=row[pfs_col], # Depends in RECIST choice
deceased=row["is_deceased"],
progressed=row["is_progressed"],
progressed_or_deceased=row["is_progressed_or_deceased"],
hla_alleles=row["hla_allele_list"],
vcf_paths=snv_vcf_paths,
normal_sample=normal_sample, # <- here
tumor_sample=tumor_sample, # <- and here
additional_data=row.to_dict())
NB: these examples are taken from the code we used recently to analyze some data from a cohort. Including that code here as possibly a more complete example, although beware it was using an earlier version of cohorts so some options may have changed since then.
Hope this gives you a good starting point. Feel free to get in touch if you run into sticky points or to give feedback on the documentation -- admittedly we need to do more on that front & to make these examples easier to find.
Hello Jacki,
Thank you so much for your response and help
I was able to successfully make patients and create them into a Cohort.
When I was making Patients with just clinical features such as OS, PFS, deceased and etc. I faced no problem, but when I try to put vcf path by entering either "snv_vcf_paths=..." or "vcf-paths=....", I encounter a "TypeError: init() got an unexpected keyword argument 'snv_vcf_paths" or "vcf_paths".
I'm sorry if these are really basic questions as I'm still new to python Thank you so much for your help
Sincerely, Jason
@js2dark happy to hear that. Sorry the error you are seeing is my fault - the syntax changed in the latest version to variants=[vcf_path1,...]
Apologies.
Hi Jackie, thank you for your help
I got the cohort to run and got the results but for for neoantigen_count, i've been getting "NaN"
the code i'm running looks like
import pandas as pd import numpy as np import sys from os import path, getcwd, environ from cohorts import Sample, Patient, Cohort, DataFrameLoader from cohorts.variant_stats import variant_stats_from_variant from cohorts.functions import missense_snv_count, neoantigen_count, snv_count
patient_1 = Patient(id="patient_1",variants=["/Users/Balthazars/Desktop/Hypermutation/IRCR_GBM_352_TL_SS.mutect_rerun_filter_vep.vcf"],os=70,pfs=24,deceased=True,progressed=True,benefit=False) patient_2 = Patient(id="patient_2",variants=["/Users/Balthazars/Desktop/Hypermutation/IRCR_BT15_847_T02_SS.mutect_pair_filter_vep.vcf"],os=100,pfs=50,deceased=True,progressed=False,benefit=True)
cohort = Cohort(patients=[patient_1,patient_2],cache_dir="/Users/Balthazars/Desktop/Hypermutation/Results") df = cohort.as_dataframe(on=neoantigen_count)
df.to_csv(r'/Users/Balthazars/Desktop/Hypermutation/Results/results.csv',index=None,sep=',',mode='a')
Is it because due to absence of HLA alleles in my Patient object? Because when I run the code it says "HLA alleles did not exist for patient patient_1" and the same for patient_2 or is there another required file besides vcf file
If it's due to absence of HLA allele, Is there a builtin function within the cohorts for analyzing HLA allele?
Thank you so much
Sincerely, Jason
Hi @js2dark / Jason,
This looks great - happy to hear you're getting these results to run, albeit partially. Yes the predicted neoantigen piece requires data for HLA types on each patient. You would need to infer these from your WES / WGS sequencing, or know them for you patients by some other means.
Unfortunately predicted neoantigen data do depend on the HLA type data. You would pass this information to the Patient
objects, as a list of HLA types much as you did for other features.
Just to be clear, this would look something like the following:
Patient(id = "",
hla_alleles = ['A*01:01',
'A*24:02',
'B*08:01',
'B*15:17',
'C*07:01',
'C*07:01'],
... )
Hi Jackie,
I got the HLA type information for the patient that I'm running and annotated with " hla_alleles='A2' " or " hla_alleles='B2' " for corresponding patients and I've been using python 3.6 and updated all other packages including mhctools,tensorflow and etc.
But seems like from "base_commandline_predictor.py" under mhctools It cant process "from mhcnames.parsing_helpers import AlleleParseError"
I was wondering if syntax has changed for this under mhcnames or a different version is required to run this. my mhcnames version is 0.2.1 and mhctools is 1.5.0
Thank you Sincerely, Jason
On Tue, Jul 18, 2017 at 1:00 AM, Jacki Buros Novik <notifications@github.com
wrote:
Hi @js2dark https://github.com/js2dark / Jason,
This looks great - happy to hear you're getting these results to run, albeit partially. Yes the predicted neoantigen piece requires data for HLA types on each patient. You would need to infer these from your WES / WGS sequencing, or know them for you patients by some other means.
Unfortunately predicted neoantigen data do depend on the HLA type data. You would pass this information to the Patient objects, as a list of HLA types much as you did for other features.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hammerlab/cohorts/issues/227#issuecomment-315787062, or mute the thread https://github.com/notifications/unsubscribe-auth/AKp6fui8OydWLdKJ4MDWqtMTJw08PEPiks5sO4OXgaJpZM4OVAnZ .
-- Jason Kyungha Sa, Ph.D Institute for Refractory Cancer Research Samsung Medical Center
I'm going to see if I can reproduce this error you're getting - will get back to you. Thanks!
If I'm in a new python 3.5.2 session with mhcnames v 1.2.0, I see the same thing you're seeing:
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mhcnames.parsing_helpers import AlleleParseError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'AlleleParseError'
It looks like in v 1.2.0 this should read:
from mhcnames import AlleleParseError
@js2dark can you send us a traceback from this error you're getting when you have a chance? This will help us determine where in the code this is coming up. Thanks so much!
@js2dark this issue should be fixed in the latest version of cohorts
. It was caused by a conflict in the latest version of mhctools & the latest version of mhcnames.
If you do pip install git+git://github.com/hammerlab/cohorts
it should be resolved. Thanks for the feedback & please let us know if you continue to run into issues --
Hi Jackie,
Below is the traceback from the error I got previously,
Traceback (most recent call last):
File "Neoantigen_cohorts.py", line 4, in
I updated the cohort through github link that you sent and updated mhctools to version 1.6.0 from 0.3.1 and mhcnames to 0.3.0 from 0.1.0as well. and now I'm getting the following errors
Using TensorFlow backend.
Traceback (most recent call last):
File "Neoantigen_cohorts.py", line 4, in
the versions of cohort is cohorts (0.6.4+14.g6926523)
Do I need to use different versions of the above packages or maybe there is another issue
Thank you and hope to hear from you soon
Sincerely, Jason
On Wed, Jul 19, 2017 at 12:51 AM, Jacki Buros Novik < notifications@github.com> wrote:
@js2dark https://github.com/js2dark can you send us a traceback from this error you're getting when you have a chance? This will help us determine where in the code this is coming up. Thanks so much!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hammerlab/cohorts/issues/227#issuecomment-316108632, or mute the thread https://github.com/notifications/unsubscribe-auth/AKp6foMIti0ucyBaJ6SCc_voLrcQHtUIks5sPNSNgaJpZM4OVAnZ .
-- Jason Kyungha Sa, Ph.D Institute for Refractory Cancer Research Samsung Medical Center
Hey @js2dark,
Apologies for this being a bit confusing, but you'll actually need to use the versions of mhctools
and mhcnames
that cohorts
now requires vs. upgrading to the latest versions of both of them. @jburos recently made a change in cohorts
to pin mhcnames
to 0.1.0
to solve this automatically.
If you pip install -r requirements.txt
in cohorts
, does that resolve the issue?
Tavi
Hi Tavi,
I ran the commands and fixed the version to provenance_file_summary': {'cohorts': '0.5.5', 'isovar': '0.7.0', 'mhctools': '0.3.1', 'numpy': '1.13.0', 'pandas': '0.20.3', 'pyensembl': '1.0.3', 'scipy': '0.19.1', 'topiary': '0.1.2', 'varcode': '0.5.15'}}
but i'm getting the following errors
Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/base_commandline_predictor.py", line 137, in init run_command([self.program_name]) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/process_helpers.py", line 74, in run_command process = AsyncProcess(args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/process_helpers.py", line 47, in init self.process = Popen(args, stdout=stdout, stderr=stderr) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 707, in init restore_signals, start_new_session) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1326, in _execute_child raise child_exception_type(errno_num, err_msg) FileNotFoundError: [Errno 2] No such file or directory: 'netMHCcons'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Neoantigen_cohorts.py", line 14, in
Thank you
Hey @js2dark, mhctools
and therefore cohorts
expects that you have NetMHC*
tools (e.g. NetMHCcons
) installed; we can't install those for you for license reasons, but the download page is at: www.cbs.dtu.dk/cgi-bin/nph-sw_request?netMHCcons.
You can also configure cohorts
to use other tools (via mhctools
), including our open source tool, https://github.com/hammerlab/mhcflurry.
Does that help?
Hello, I'm fairly new to python and I've been trying to use the cohorts library to mainly calculate neoantigens in my tumor samples. I have each tumor's processed BAM and VCF files but I'm having a difficult time trying to combine them into creating a cohort to proceed to counting neoantigens. If there is any step by step manual to creating a cohort, I would greatly appreciate if you could share it Thank you and hope to hear from you soon