output columns explain - Githubissues

junyanzho commented 4 years ago

Dear VIcaller developer,

I want to know meaning of some output columns. I see output from detect function that No._chimeric_reads and No._split_reads both are zero, but No._reads_supporting_VI have value bigger than 0. supporting reads = chimeric reads + split reads, is it right? And how to explain these columns？

Best, JY

xunchen85 commented 4 years ago

yeah, there is a bug in the script, but it will be corrected after you run the validation step.

Xun

junyanzho commented 4 years ago

Hi, xunchen,

Maybe I didn’t quite understand what you said. Which columns are right? which wrong and corrected by validation? I try to run validate function with one integration using below command: VIcaller.pl validate \ -c VIcaller.config -i Sample \ -S Sample_18_72481819_72483840_hepatitis_b_virus_21326584 \ -G 21326584 -V hepatitis_b_virus -t 10

After run it without error, the No._chimeric_reads and No._split_reads still 0, but No._reads_supporting_VI is 61. Seems still not consistently.

For virus integration detection, should validation and allele fraction be required? Can I only do detection analysis?

Thanks and hope relpy JY

xunchen85 commented 4 years ago

I see, after you run the validation step, it will add a few columns, including columns of validation_chimeric and Validation_split. The sum of these columns should be "61". It should give you the same information. My original idea is that because some of the reads can not be successfully validated, thus the detected chimeric and split reads may not be correct. Indeed, the sum of validated chimeric and split reads should be mainly considered unless you want keep as more candidates as you want.

Regarding your another question, yes, you can run "detect" function to detect viral integration and run "calculate" function to obtain the allele fraction.

Best, Xun

xunchen85 commented 4 years ago

I will soon correct the two columns in the main VIcaller script which will use the original number.

Xun

junyanzho commented 4 years ago

Dear Xun, I didn't found column name "validation_chimeric" and "Validation_split". For other similar names, sum is not equal to 61.

No._reads_supporting_VI	Average_alignment_score	Is_cell_line_contamination	Is_vector	Validation_chimeric_confident	Validation_chimeric_weak	Validation_chimeric_false	Validation_split_confident	Validation_split_weak	Validation_split_false
61	89.18033	-	-	100	0	0	32	0	0

Best!

JY

xunchen85 commented 4 years ago

I'm wondering if you can share more info. I only see the inconsistency for the supporting reads, but it is hard for me to follow and address the potential issues without additional information. I am also not quite sure if you successfully run the validation step.

Validated reads were extracted from the visualization figure. Thus you can first check the corresponding visualization record. You can count how many unique reads, how many chimeric and split reads there, 61 or 132? If you are not sure, you can share me the screenshot, visualization file, fuq file, output file, that i can help debug it.

If you run script step by step, it may also bring some potential issues, especially when you modified the script on your own.

Best, Xun

junyanzho commented 4 years ago

Hi xunchen,

I checked the sample.visualization file, number of lines starts with 'O2' is 132 and sequence from 'seq0' to 'seq132'.
After revised script, I run detect function as your manual recommend, not step by step, and the run log seems no error.

Regards, JY

junyanzho commented 4 years ago

Hi xunchen, I found some information:

No._reads_supporting_VI from output is field 39 of file sample.virus_f2.
Validation reads counted from file sample.visualization that is from file sample_f2
sample_f2 -> sample_f22 -> sample.virus_f -> sample.virus_f2 through multiple process.
Total validation reads is 132, while No._reads_supporting_VI is 61.

Regards JY

xunchen85 commented 4 years ago

From what you described, you may check or validate the wrong visualizationg record.

You can double check the GI number, because if there are multiple candidate GIs detected, we will keep all of them. Meanwhile, you also can check if there is another record show similar or the same number of reads as "61".

if you use your own customized library, that may also be the issue with the viral ref name which may cause the inconsistency.

Xun

xunchen85 / VIcaller

output columns explain #10