natsuhiko / rasqual

Robust Allele Specific Quantification and quality controL
37 stars 20 forks source link

2707 caQTL in RASQUAL paper #19

Open YaCui opened 5 years ago

YaCui commented 5 years ago

Dear Natsuhiko, Thanks so much for developing rasqual! Could you provide the 2707 caQTLs identified in RASQUAL paper?

best, Ya

natsuhiko commented 5 years ago

Hi,

Here is the link to the google drive: https://drive.google.com/open?id=0B-aFDIHv9Wy3M3kwS1hPM09TRlU

You can find the peak annotation (peaks.bed.gz) as well as the peak IDs at FDR 10% (pid.fdr10.txt).

I would, however, recommend to use the latest caQTL result with 100 British samples presented in our latest paper (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics).

Best regards,

Natsuhiko

YaCui commented 5 years ago

Great! Thanks for sharing!

best, Ya

YaCui commented 5 years ago

Dear Natsuhiko, I have a small question. How should I determine the values of -l and -m? Can I just use "-l 378 -m 62" in my analysis for all features?

Thanks, Ya

natsuhiko commented 5 years ago

You need to count appropriate numbers of SNPs for each feature by your self. It's relatively easy to count the number of tested SNPs (-l) by counting the number of rows in VCF that are fed to RASQUAL (you can just use wc command on linux). You could set the number of feature SNPs (-m) as the number of tested SNPs if you have enough memory and not sure how to count the number of SNPs overlapping with multiple features.

Best regards, Natsuhiko

YaCui commented 5 years ago

Dear Natsuhiko, I am a little confused about the results of Rasqual. I can get the results like "rasqual_atac_1M.gz", but how can I get the q-values in "Q.val.txt.gz"? It seems that q-values in "Q.val.txt.gz" are different from the "Log_10 Benjamini-Hochberg Q-value" in "rasqual_atac_1M.gz".

All files are from https://drive.google.com/drive/folders/0B-aFDIHv9Wy3M3kwS1hPM09TRlU.

Thanks, Ya

natsuhiko commented 5 years ago

Sorry for the confusion. The file "rasqual_atac_1M.gz" is old and the 10th column is not the Q value. This is because we provide the Q values as a separate file.

Best regards, Natsuhiko

YaCui commented 5 years ago

Hi Natsuhiko, So how can I get the Q values file? I cannot get this file if I just run the commands like below:

cd $RASQUALDIR tabix data/chr11.gz 11:2315000-2340000 | bin/rasqual -y data/Y.bin -k data/K.bin -n 24 -j 1 -l 378 -m 62 -s 2316875,2320655,2321750,2321914,2324112 -e 2319151,2320937,2321843,2323290,2324279 -t -f C11orf21 -z

Thanks, Ya

natsuhiko commented 5 years ago

Sorry, but I don't understand your problem. I believe Q.val.txt.gz gives you the Q value for each peak in the rasqual_atac_1M.gz file.

The example command found in the github page is for RNA-seq, but not ATAC-seq we provided in the Google drive.

Best regards, Natsuhiko

YaCui commented 5 years ago

Hi Natsuhiko, Got it. Thank you so much for your help.

Thanks, Ya

plbngl commented 4 years ago

Hi Natsuhiko,

regarding the caQTL result with 100 British samples (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics), I have your summary statistics with the probabilities but I don't know what is the cutoff you use to define a caQTL and how many are there in total? I cannot find it in the paper. Thank you very much!!!!! Paola

natsuhiko commented 4 years ago

Hi Paola,

The RASQUAL mapping result based on 24 LCLs (not 100 LCLs) is found here: https://drive.google.com/drive/folders/0B-aFDIHv9Wy3M3kwS1hPM09TRlU

The paper you cited is different. In the paper, we used 100 LCLs and performed caQTL mapping with a different approach to detect causal interactions in the genome. Because we used a Bayesian approach, we don't have "significant caQTLs" but just posterior probabilities.

Best regards, Natsuhiko

plbngl commented 4 years ago

Thank you Natsuhiko! Yes I have been using the results from the 24 LCLs of the first study, but since in your comment above you said: "I would, however, recommend to use the latest caQTL result with 100 British samples presented in our latest paper (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics)", I though that you also identified caQTL, maybe more than using 24 samples so I though to use this new study.... Anyway I can just use the results from the 24 samples ! Thank you very much!! Paola