mskcc / facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
139 stars 65 forks source link

Calling CNA from Cat genome; custom #188

Open makarov-ccf opened 1 year ago

makarov-ccf commented 1 year ago

Hello

I am trying to call CNA from Cat (Genome assembly: Felis_catus_9.0 (GCA_000181335.4)) I see that for all species except human and mouse, we need to provide GCcontent file. I have created it as per author's recommendations with https://github.com/soccin/mkGCPct package The resulting file is Felisgcpct.rda I pass full path to it as ugcpct argument ugcpct=Felisgcpct.rda, but have an error message

Loading required package: pctGCdata Error in 1:nchr : result would be too long a vector Calls: preProcSample -> counts2logROR In addition: Warning message: In max(out$chrom) : no non-missing arguments to max; returning -Inf Execution halted

The genome file is attached

Thank you

Felis_catus.Felis_catus_9.0.genome.zip

veseshan commented 1 year ago

From direct messages with @makarov-ccf we found out that:

…cats have three large metacentric chromosomes (A1 to A3), four large subtelomeric chromosomes (B1 to B4), two medium-size metacentrics (C1 and C2), four small subtelomerics (D1 to D4), three small metacentrics (E1 to E3), and two small acrocentrics (F1 and F2). Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7152298/

Internally facets uses chromosomes as labeled 1:(nX-1) and "X" (nX = number of autosomes + 1) Mapping A[1-3] as 1:3, B[1-4] as 4:7, C[1-2] as 8:9, D[1-4] as 10:13, E[1-3] as 14:16 and F[1-2] as 17:18 can work.

makarov-ccf commented 1 year ago

Thank you for your input, it worked. One thing to note, I had to edit the VCF file I downloaded from https://ftp.ensembl.org/pub/release-109/variation/vcf/felis_catus/ and exclude all extra chromosomes leaving only A1, A2, A3, B1, B2, B3, B4, C1, C2, D1, D2, D3, D4, E1, E2, E3, F1, F2 (in that order), otherwise snp-pileup crushed after chromosome A3 when it encountered contigs like AANG04003642.1. Then I had to follow the steps:

Replace chromosomes as the author suggested. The resulting pileup files look like: Chromosome,Position,Ref,Alt,File1R,File1A,File1E,File1D,File2R,File2A,File2E,File2D 1,55308,T,C,0,23,0,24,0,25,0,20 1,73611,A,T,14,40,0,0,0,22,0,0 1,73739,A,G,6,16,0,0,0,15,0,0 1,73817,G,A,5,53,0,0,0,40,0,0

Run facets as usually according to author's instructions at https://github.com/mskcc/facets

Replace Chromosomes names from 1,2,3... back to A1, A2, A3... in CN reports

As a reminder, I had to generate GC content file required by FACETS to call CNA in hon-human species was crated according to author's recommendations with https://github.com/soccin/mkGCPct package (One time procedure)

veseshan commented 1 year ago

Thanks @makarov-ccf for the detailed instructions.