pinellolab / CRISPRme

Other
18 stars 8 forks source link

Understanding outputs #50

Closed Huanle closed 6 months ago

Huanle commented 8 months ago

Hi @samuelecancellieri ,

Thank you and your colleagues for developing such an invaluable tool.

I had a first trial with cisprme using docker. It seemed to work and produced lots of results which I have been trying to understand.

To be exact, I would love to understand the columns in *altMerge.txt.bestCFD.txt, *bestMerge.txt and final_results_*.bestMerge.txt.bestCFD.txt.*.

I am particularly interested in understanding the columns coming from the *altMerge.txt file and have been listed below.

Moreover, If I am going to select the most likely off-targets, which file and which metrics should I rely on to make choices?

Thanks a lot in advance and Your help will be greatly appreciated.

#######################columns from *altMerge.txt file ########## PAM_gen Var_uniq

Seq_in_cluster

CFD_ref Highest_CFD_Risk_Score Highest_CFD_Absolute_Risk_Score MMBLG_PAM_gen MMBLG_CFD MMBLG_CFD_ref MMBLG_CFD_Risk_Score MMBLG_CFD_Absolute_Risk_Score

samuelecancellieri commented 8 months ago

hi @Huanle thanks for using the tool and for the question. I'm attaching our article supplementary material, if you look at suppl. table 1 you will find answers to your questions. the table is referring to integrated results file, but the reasoning behind the column is the same and the name are basically the same. var_uniq is called Not_found_in_REF, and the meaning is if a sequence is novel to the introduction of variants or is just a variation to a reference sequence.

Seq_in_cluster is called Other_motifs, and is a raw count of sequences falling in the same region of the reported target.

MMBLG_ is called fewest_mm+b, and basically is a criteria we used to score the targets not using CFD or CRISTA score, but by counting the number of mm+bulges in the target aligned sequence, sorting by lowest sum of these two. risk_score is reported also in the integrated results file and is the difference between CFD_ref and CFD_alt, to help understanding if the introduction of variants in a sequence is giving you an higher or lower score (the absolute is just the value without the sign).

hope this helps. thanks again.

41588_2022_1257_MOESM1_ESM.pdf

Huanle commented 6 months ago

Hi @samuelecancellieri ,

thanks a lot for providing the helpful explanation and detailed document.

Best - Huanle