xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
24 stars 17 forks source link

"CSV error" when running Annotate.R #19

Closed SheaCheng2000 closed 1 year ago

SheaCheng2000 commented 1 year ago

Hi,

Recently I was trying to use our in-house data in STAARpipeline for WGS burden test. When I ran the Annotate.R, the log showed errors like this:

[1] 1 CSV error: record 4 (line: 5, byte: 46): found record with 2 fields, but the previous record has 1 fields [1] 2 CSV error: record 29 (line: 30, byte: 439): found record with 2 fields, but the previous record has 1 fields [1] 3 CSV error: record 1 (line: 2, byte: 8): found record with 2 fields, but the previous record has 1 fields ...

However, it did not stop executing the job, and generated the output "Anno_chr1_STAARpipeline.csv".

I wondered how these errors happen (is it because of the VCF format?) and whether these errors would affect the final burden result.

Attached is the log file. slurm-59206.log

Looking forward to your reply! Thanks a lot!

Shea

xihaoli commented 1 year ago

Hi Shea,

According to the log file, you seemed to be using the FAVOR Full Database to annotate the GDS file. However, you should use the FAVOR Essential Database to annotate the GDS file in Step 2 of FAVORannotator.

In addition, it seems that some of the dimensions were not matched. For example, the genotype field in your GDS file indicates there are 1,650,425 variants in your data, however the position field indicates there are 1,611,114 variants in your data. These discrepancies should be fixed before running FAVORannotator.

Hope this helps.

Best, Xihao

SheaCheng2000 commented 1 year ago

Hi Xihao!

Thanks for your advice! I have turned to the essential DB now. And I think the inconsistency of genotype and position may be the QC problem. I will try to figure it out.

Shea

xihaoli commented 1 year ago

You're welcome, Shea.

Best, Xihao

SheaCheng2000 commented 1 year ago

Hi xihao!

Just a feedback ) The inconsistency of genotype and position was caused by the QC problem. I forgot to split multiallelic variants😢.

Thanks! Shea

xihaoli commented 1 year ago

Thanks so much for letting me know, Shea.

Best, Xihao