Closed HenriettaHolze closed 7 months ago
Hi Henrietta,
I reimplemented the mutations stuff and it is now split into two parts: a mutation identification function find_variants
that treat the long-reads as bulk and find mutations, returning a tibble with gene name, position, allele, read counts and a local homopolymer percentage; a mutation caller at the single-cell level, sc_mutations
that takes positions and count the number of reads supporting each allele at the positions for each single-cell barcode, also returning a (massive) tibble that have the cell barcode, allele, read counts and depth columns.
Apart from native multithreading in R, I think this gives more control and information for the variant calling part, where you can filter potential mutation with the tibble yourself before counting reads at single-cell level, and call mutation status for single-cells with your own rules instead of some arbitrary simplistic threshold. I wonder if you think this is over-complicating things or actually helpful as a user.
These functions are currently only available on Github, you can install with BiocManager::install("mritchielab/FLAMES")
to try them out.
Hi, I ran the mutation calling with FLAMES but would like to get an output file where SNP positions are matched to the gene ID or gene symbol. This information should be available since every read has been mapped to a gene by minimap2. E.g. as additional column in the allele_stat.csv.gz output file.
I ran the mutation calling using the python functions as follows:
Lastly, it would be useful to have the gene symbol in the gene_count.csv output file of FLAMES, next to the gene ID.
Cheers!