Open jessicakan789 opened 6 months ago
Example in R:
# import libraries
library(dplyr)
library(stringr)
library(vcfR)
library(tidyr)
# parse command line arguments
args <- commandArgs(trailingOnly = TRUE)
# import data
vcf <- read.vcfR(args[1], verbose = FALSE)
# read vcf as tidyverse dataframes
vcf_df <- vcfR2tidy(vcf)
vcf_df_fix <- vcf_df$fix
# split INFO column into multiple columns
info <- extract_info_tidy(vcf)
# isolate Pangolin column
pangolin_df <- info['Pangolin']
# [How to Split Column Into Multiple Columns in R DataFrame? - GeeksforGeeks](https://www.geeksforgeeks.org/how-to-split-column-into-multiple-columns-in-r-dataframe/)
edit_pangolin_df <- str_split_fixed(pangolin_df$Pangolin, '[\\|:] ', 6)
# change column names
colnames(edit_pangolin_df) <- c('Pangolin gene', 'Pangolin pos_1', 'Pangolin score_change_1', 'Pangolin pos_2', 'Pangolin score_change_2', 'Pangolin warnings')
# merge pangolin columns with original data
merged_df <- cbind(vcf_df_fix, edit_pangolin_df)
# write out to csv
write.csv(merged_df, args[2], row.names=FALSE)
Great tool!
Just a future suggestion - maybe give the user the option to output pangolin results in separate columns rather than just in the INFO column of VCF files as it makes it difficult for users to read?
Alternatively, an even better method would be to write a VEP plugin.
Thank you! :)