Open toddajohnson opened 3 years ago
Hi @toddajohnson, thank you for providing this valuable feedback and proposing these solutions. I had noticed 1 and 3 during a recent run-through of mine and had the exact changes you mentioned in a branch that I hadn't yet merged with master. Like I said, we really do appreciate these solutions as we want feedback and to make RegTools as easy to implement as possible.
In getting the Example workflow to run on my data, I noticed a couple of problems and propose solutions (if I am not just doing something strange).
vcf-concat samples/*/variants.vcf.gz | vcf-sort > all_variants_sorted.vcf
failed due to "The column names do not match" (cannot concatenate Sample1 onto Sample2). I switched to usingbcftools merge -Ou samples/*/variants.annotated.vcf.gz | bcftools sort -Oz -o all_variants_merged_sorted.vcf.gz -
, which merged by variant and creates a sample data field for each sample.file_name = paste("samples/", sample, "/output/cse_identify_filtered_compare_", tag,".tsv", sep = "")
using fread and then selects the data.tablecse_identify_data[,.(sample,variant_info,chrom,start,end,strand,anchor,score,name,genes)]
. Since there was no genes column, it halted. I modified the R script withcse_identify_data[,genes:=gene_names]
and then the Example workflow finished.