Open BrendanDee opened 9 months ago
df_cts.axo_xy is a prediction from the testdata df_cts.txt. df_cts.txt was a truncated table from read data table. It is possible there no true outliers were included. I didn't notice this problem before you tell me. I will remake the testdata next week.
I'm unsure if it's an issue with my implementation or the test data you mentioned. In regards to outlier detection , I know it's primarily done using axo_pred.py, with the LOF, IF, and OC-SVM model where the script uses the outrider normalization counts and p values from Outsingle .ogs . So far axo_pred.py selects the LOF model for other datasets (GTEx data) but it doesn't seem to detect any outliers, they all return "0" in data.axo_xy "label" for non-aberrant and currently I'm unsure of what the model deems as aberrant genes / outliers. There are genes that are confirmed aberrant by outsingle and outrider but axo_pred.py seems to detect nothing.
I'm aware in axo_pred,py that the script calculates rank_devi and cts_devi which axolotl compares them to regular gene expression to detect outliers, then generates df_cts.axo_xy which includes the stats and whether the samples are non-outlier and outlier. I could use more information on the outlier detection itself and how axo_pred.py works to detect outliers.
First, the script axo_pred.py mainly creates a feature table with five features, named *.axo_xy.txt. The column "label" here was only used for myself during method development. The 0s in the column 'label' is ignored by LOF model later. The outlier scores is in axo.txt rather than axo_xy.txt. I keep axo.txt as the same shape as input file df_cts.txt.
Axo doesn't provide hard threshold value. The purpose of axo is to prioritization rather than classification. You could refer to details in our preprint at https://www.biorxiv.org/content/10.1101/2024.01.07.574502v1
Thank you for your feedback.
I see ,so the outlier scores are obtained from axo_pred.py and saved in the axo file. So the LOF model takes into account the devi variables (deviations) like cts_devi then uses that to generate the anomaly scores saved in axo. This then determine the likelihood of a gene to illicit aberrant gene expression. Then you manually checked if they were spots for aberrant/outlier gene events. I hope I understood correctly.
Exactly
I'm interested in how you manually detected the genes which are responsible for aberrant outlier gene events, did you have a database you had to manually look through?
1 last question if an anomaly score is higher or lower does it increase the likelihood of a gene being aberrant
I'm interested in how you manually detected the genes which are responsible for aberrant outlier gene events, did you have a database you had to manually look through?
No database. I have searched related publications and collected the testdata
1 last question if an anomaly score is higher or lower does it increase the likelihood of a gene being aberrant
LOF outputs are negatives. The lower score means more aberrant.
Hi Xu, I'm on the same project as Brendan and we are trying to re-create your methods on about 50 GTEx datasets. Would you mind sharing the distribution of anomaly stores you obtained from your GTEx datasets? Just to make sure we implemented it correctly - we are getting almost all anomaly scores between [-2, 0]. Thank you.
Hi Xu, I'm on the same project as Brendan and we are trying to re-create your methods on about 50 GTEx datasets. Would you mind sharing the distribution of anomaly stores you obtained from your GTEx datasets? Just to make sure we implemented it correctly - we are getting almost all anomaly scores between [-2, 0]. Thank you.
Were there any outliers or aberrant genes in the sample data df_cts that axo_pred,py found ? I recreated the environment according to the instructions and the "label" column in df_cts.axo_xy was always 0 (the prediction of the model) after running demo.sh on ubuntu.