nicolazzie / AffyPipe

an open-source pipeline for Affymetrix Axiom genotyping workflow on livestock species
13 stars 7 forks source link

About Call Rate Treshold and SNPolisher #4

Closed bibb closed 9 years ago

bibb commented 9 years ago

Dear Dr. Nicolazzi:

I'm using AffyPipe to analyze my Axiom LAT 1 GWAS array. I have followed the README recommendations, installed the lastest libraries and software from the Affymetrix website. When I try to run AffyPipe I'm having 2 problems.

First, I think I have found a bug in the code, in line 429:

  1. if float(CRATE)<float(opt.CR):

The opt.CR value is set to 0.97 as default if no custom threshold is provided. Doing this, the pipeline doesn't notice about individuals with call rate value less than 97 %. I've managed to run the code and see that opt.CR is looking for decimal values low than 0.97 and not actual percentage 97: The input file AxiomGT1.report.txt has a call rate column in percentages and not in decimals. I've changed the opt.CR to an actual number 97 and the program worked correctly identifying the individuals below that threshold:

  1. if float(CRATE)< 97:

The second thing is about the SNPolisher package, in the current README, you say that we must have version v1.5.0 but in the Affymetrix DevNet webpage the current and only version available for downloading is v1.5.1. After solving the call rate issue, AffyPipe crashed at the first step of the SNPolisher part. I attached the log file to an email I just wrote to you, but the taceback error line is not there!, the problem was about the script couldn't find Allele A, or something like that... But the thing is that the error was at the beginning, and I think this could be because of the version of SNPolisher. or maybe AffyPipe code.

Hope you can help me, thank you in advance. I'll look forward to your answer,

Best regards

nicolazzie commented 9 years ago

Dear bbib, yes, your bug report was correct. I'm sorry, during the evolutions of AffyPipe that way of considering Call Rate changed and it slipped my control. Thanks for reporting this. I've just pushed a new version correcting this issue.

As for the second issue, I'm sorry but I did not get your email.. .please run AffyPipe using the --debug option and send me the log and the error message you're getting so I can be of help. Yes, SNPolisher version has changed serveral times now.. but AffyPipe was coded to be consistent across versions (except if something BIG has changed in Affy's procedures!).

Please send me the email again so I can check this out! Again, sorry for the inconvenient and thank you for the report!

Best, Ezequiel

bibb commented 9 years ago

Hello Ezequiel,

Here are my logs files, there is 2 files, one the AffyPipe log in debug mode (--debug) and the other one the Traceback error lines when the script stopped in the SNPolisher step.

http://www.filedropper.com/affypipelogtar

I hope you can help me,

I'll look forward to your answer.

Best regards.

--Bernabé

nicolazzie commented 9 years ago

Hi Bernabè, the error you're getting is not quite a SNPolisher error. It's in the SNPolisher "area" but it is a preliminary step before going into R. I've been doing a lot of troubleshooting for similar issues in different species.

The problem is the following. Each species has its own annotation file, with a different number of columns. Thus, my original solution of a fixed position for the alleles within the annotation file did not work for all users. I've noticed that all (at least all so far...) annotation files had the header in common.. AlleleA and AlleleB were always the headers for all files in all annotation files I've checked (and I've checked a few!). I downloaded your annotation file (Axiom GW CEU Hu), and run some tests... and identified the problem. There was a SNP that met the requirements for the header, that's why the program was stopping. I've just pushed a patch to that, and included a more "human readable" error in case AlleleA is not found.

I'm not able to test the program as I don't have any CEL file, but it should work. Please let me know! Best wishes and happy easter. Ezequiel

bibb commented 9 years ago

Thank you very much, I got my new results running the corrected script and everything went fine this time. Happy easter!

Bernabé

nicolazzie commented 9 years ago

Cool! That's great news indeed! Thanks for your feedback!

Best wishes, Ezequiel

bibb commented 9 years ago

Hello Ezequiel

Now I'm working with the plink files and I've noticed that the map file is wrong, this is a sample of my file:

2 Affx-23821302 0 4 2 Affx-24632518 0 4 2 Affx-24634529 0 4 2 Affx-24054471 0 4 2 Affx-23870609 0 4 2 Affx-23844803 0 4 2 Affx-14968479 0 18

Tha map structure is supossed to be CHR SNP CM BP but cheking the annotation file Axiom_GW_Hu_SNP.na34.annot.csv the columns that were actually printed by the AffyPipe code in the map file were:

  1. dbsnp loctype
  2. Affy SNP ID
  3. 0
  4. Chromosome

I have tried to fix the problem manually because the Affy SNP ID is correc, so I wanted to do a lookup and correct the annotation, but I found snps in my map file typed like this:

--- --- 0 ---

That's kind of tricky or almost impossible to find in the annotation file to know what it is.

Hope you can help me one more time. I'll look forward to your answer

Best regards Bernabé

nicolazzie commented 9 years ago

Dear Bernabè, arrgggg.. not sure what is going on with this annotation file! I'm afraid the same that happened for allele coding is happening again... Affy seems to be having fun changing column positions! :(

As for this specific problem you're facing, I think it is quite easily addressable (e.g. without re-running all AffyPipe again). I'm downloading again your CEU annotation file now, so I can provide a patch (small python program) to solve your problem quite soon (please send me an email to ezequiel.nicolazzi@tecnoparco.org, so I can follow this closely).

In any case, I have to check this more thoroughly to avoid this issue to happen again. Thanks and sorry, again! Ezequiel