vanallenlab / comut

CoMut is a Python library for visualizing genomic and phenotypic information via comutation plots
MIT License
89 stars 28 forks source link

ValueError: Unknown sample #4

Closed mjko1210 closed 4 years ago

mjko1210 commented 4 years ago

Hi, I added 3 different tracks as below:

comut = comut.CoMut()

add ploidy information

comut.add_continuous_data(ploidy_df, name = 'Ploidy', mapping = 'Blues', cat_mapping = ploidy_mapping, value_range = (1.5, 4.5))

add purity

comut.add_continuous_data(purity_df, name = 'Purity', mapping = 'Purples', cat_mapping = purity_mapping, value_range = (0, 1))

add cnv

comut.add_categorical_data(cn_df, name='Copy number/type', mapping = cna_mapping, tick_style='italic', value_order = ['CN_Amplification'], borders = ['H3K27ac'], priority = ['CN_Amplification','TandemDup'])

I assume it adds each track in the reverse order (cnv-> purity-> ploidy). However, I'm getting an error All added samples must be a subset of either first samples added or samples specified with comut.samples since the samples in cnv are part of purity/ploidy (not all samples from purity/ploidy have cnv) . I've also indicated samples using comut.samples (for those with cnv available) right after creating comut object, however, it still gives me the same error. Is there any way to get around this?

The point here is: I don't need to plot all the samples mentioned from ploidy/purity. I want to visualize ploidy/purity only when cnvs are available.

Thanks!

mjko1210 commented 4 years ago

Sorry, this error only happened when I specified samples. When I didn't specify sample names with comut.samples, it was fine! please ignore above!

jett-crowdis commented 4 years ago

Glad you were able to resolve the error! Just in case anyone happens on this issue in the future - there are two ways to specify samples to visualize in the comut:

  1. Just add the datasets, in which case CoMut will determine samples based on the first dataset added (in your case, ploidy_df). If any later datasets have additional samples that weren't present in the first dataset, CoMut will throw an error - it won't go back and give blank ploidy values to samples that weren't present in the original ploidy_df.

  2. Specify the samples before data is added with comut.samples. Again, CoMut will throw an error if future datasets have samples that are not present in this original list. In your case, it threw an error because there were samples with ploidy data that weren't present in your initial list (based on cnvs).

In either case, the key is that the sample list CoMut is using (either based on the first dataset or samples you give it) is all inclusive, meaning it contains every sample that will appear in the comut, even if some datasets are missing those samples.