Open dejonggr opened 1 year ago
Interesting .. I'm not sure why this is happening but I would try to read in the VCF files and step through the genotyping function until the marker_map
part:
https://github.com/kharchenkolab/numbat/blob/main/inst/bin/pileup_and_phase.R#L214-L249 https://github.com/kharchenkolab/numbat/blob/main/R/genotyping.R#L153-L293
If you are still having trouble let me know.
So I've run the function preprocess_allele
from genotyping.R
line by line using the output from pileup and phasing directories and marker_map
contains the expected output (i.e. cM info for all SNPs). I'm assuming every step before this completed successfully as the results are as expected.
I'm not sure what's going on but maybe it's a problem with the gmap accessible in Singularity? The only difference is that I ran pileup_and_phase.R
via singularity run
and instead of using the gmap in the container, I copied it locally.
Hi there,
I've been struggling to fix an error with your tool RE: missed pHF data. It seems numbat only pHF data for chromosomes 1-2 and filters all other SNPs - probably due to all cM values for others chromosomes having NaN values. The allele counts tsv itself has SNP data for all expected genes and chromosomes, but it seems cM values are not calculated after chromosome 2.
Note:
It seems like this is an incompatibility between the phasing df and the gmap dataframe but I'm not entirely sure why or when this is breaking down.
Looking to the relevant code for pileup_and_phase, I think the NaNs stem from the left join with the gmap:
But I don't exactly understand why they aren't merging since the VCF chromosome notation is the same between chr2/3 and chr3 isn't merging.
For context, heres my initial command:
Output from pileup and phase: