Example data error - Githubissues

rwdavies commented 4 years ago

Hi,

I tried running the example data with the following code and got the following error.

Thanks Robbie

set -e
tmp_dir=$(mktemp)
rm ${tmp_dir}
mkdir -p ${tmp_dir}
cd ${tmp_dir}

echo ${tmp_dir}
git clone https://github.com/immunogenomics/HLA-TAPAS
cd HLA-TAPAS

python HLA-TAPAS.py \
    --target example/Case+Control.300+300.chr6.hg18 \
    --reference example/1000G.EUR.chr6.hg18.28mb-35mb \
    --hped-Ggroup example/1000G.EUR.Ggroup.hped \
    --pheno example/Case+Control.300+300.phe \
    --hg 18 \
    --out MyHLA-TAPAS/Case+Control+1000G_EUR_REF \
   --mem 16g \
   --nthreads 4

/tmp/tmp.GVS8KUH4KH
Cloning into 'HLA-TAPAS'...
Checking out files:  18% (45/250)   

[redacted, it checks out files for a while]

Checking out files: 100% (250/250)   
Checking out files: 100% (250/250), done.
Warning: Variants 'HLA_A*01:01:01G' and 'HLA_A*01' have the same position.
Warning: Variants 'HLA_A*02' and 'HLA_A*01:01:01G' have the same position.
Warning: Variants 'HLA_A*02:01:01G' and 'HLA_A*02' have the same position.
5296 more same-position warnings: see log file.
Namespace(aa_only=False, condition=None, condition_gene=None, condition_list=None, covar=None, covar_name=None, dependency='dependency/', exclude_composites=False, exhaustive=False, exhaustive_aa_pos=None, exhaustive_max_aa=2, exhaustive_min_aa=2, exhaustive_no_filter=False, hg='18', hped='example/1000G.EUR.Ggroup.hped', maf_threshold=0.005, mem='16g', min_haplo_count=10, niterations=5, nthreads=4, out='MyHLA-TAPAS/Case+Control+1000G_EUR_REF', output_composites=False, pcs=None, pheno='example/Case+Control.300+300.phe', pheno_name=None, pop=None, reference='example/1000G.EUR.chr6.hg18.28mb-35mb', reference_bim=None, remove_samples_aa_pattern=None, remove_samples_by_haplo=False, save_intermediates=False, sex=None, target='example/Case+Control.300+300.chr6.hg18', tolerated_diff=0.15)

[HLA-TAPAS.py]: Generating G-group CHPED with given HPED('example/1000G.EUR.Ggroup.hped').
[NomenCleaner.py]: Generating CHPED with G code HLA alleles.
[HLA-TAPAS.py]: Generated CHPED: 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.Ggroup.chped'.

[HLA-TAPAS.py]: Generating Reference panel.
[MakeReference_v2.py]: Making Reference Panel for "MyHLA-TAPAS/Case+Control+1000G_EUR_REF.REF.bglv4"
[1] Generating Amino acid(AA)sequences from HLA types.
[2] Encoding Amino acids positions.
[3] Encoding HLA alleles.
[4] Generating DNA(SNPS) sequences from HLA types.
[5] Encoding SNP positions.
[6] Extracting founders.
[7] Merging SNP, HLA, and amino acid datasets.
[8] Performing quality control.
[9] Preparing files for Beagle.
[10] Converting PLINK to BEAGLE format.
[11] Converting BEAGLE to VCF format.
[12] Phasing reference using Beagle4.1.
[13] Making reference panel for HLA-AA,SNPS,HLA and Normal variants(SNPs) is Done!
[HLA-TAPAS.py]: Generated Reference panel : 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.REF.bglv4'.

[HLA-TAPAS.py]: Performing SNP2HLA imputation.
SNP2HLA: Performing HLA imputation for dataset example/Case+Control.300+300.chr6.hg18
- Java memory = 16gb
[1] Extracting SNPs from the MHC.
[2] Performing SNP quality control.
[3] Converting PLINK to BEAGLE format.
[4] Converting BEAGLE to VCF format.
[5] Performing HLA imputation.
[HLA-TAPAS.py]: Imputed result : 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.bgl.phased.vcf.gz'

[HLAassoc.py::WARNING]: Using phenotype column 'RA' in 'example/Case+Control.300+300.phe' file.
[HLAassoc.py]: Performing Logistic Regression.
[HLA-TAPAS.py]: Output Logistic Regression result : 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.assoc.logistic'.

[HLAassoc.py]: Phased BEAGLE file will be generated from given VCF file('MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.bgl.phased.vcf.gz').
[HLAassoc.py]: Top 10 PCs will be generated from given VCF file('MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.bgl.phased.vcf.gz').
[HLAassoc.py::WARNING]: All samples will be assumed to be originated from same population.
Traceback (most recent call last):
  File "HLA-TAPAS.py", line 409, in <module>
    f_exhaustive_no_filter=args.exhaustive_no_filter
  File "HLA-TAPAS.py", line 171, in HLA_TAPAS
    _java_heap_mem=b_mem)
  File "/tmp/tmp.GVS8KUH4KH/HLA-TAPAS/HLAassoc/HLAassoc.py", line 645, in __init__
    if self.hasSEXinFAM(self.fam):
  File "/tmp/tmp.GVS8KUH4KH/HLA-TAPAS/HLAassoc/HLAassoc.py", line 985, in hasSEXinFAM
    f_NA3 = df_fam['Sex'].isna()
  File "/apps/well/python/3.5.2-gcc5.4.0/lib/python3.5/site-packages/pandas/core/generic.py", line 2360, in __getattr__
    (type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'isna'

WansonChoi commented 3 years ago

I rechecked running the example data on my system and It worked fine. So, I guess this maybe due to different system setting.

Can you tell me what is your Pandas version first?

Traceback (most recent call last): File "HLA-TAPAS.py", line 409, in f_exhaustive_no_filter=args.exhaustive_no_filter File "HLA-TAPAS.py", line 171, in HLA_TAPAS _java_heap_mem=b_mem) File "/tmp/tmp.GVS8KUH4KH/HLA-TAPAS/HLAassoc/HLAassoc.py", line 645, in init if self.hasSEXinFAM(self.fam): File "/tmp/tmp.GVS8KUH4KH/HLA-TAPAS/HLAassoc/HLAassoc.py", line 985, in hasSEXinFAM f_NA3 = df_fam['Sex'].isna() File "/apps/well/python/3.5.2-gcc5.4.0/lib/python3.5/site-packages/pandas/core/generic.py", line 2360, in getattr (type(self).name, name)) AttributeError: 'Series' object has no attribute 'isna'

In the error message, The 'isna' is one of Pandas functions('https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isna.html') and caused no trouble to me.

cf. My Pandas version was 1.0.5.

Thanks, Wanson

rwdavies commented 3 years ago

I upgraded pandas to a later version 0.24.2 as on this machine I only have Python 3.5.2 (and do not have system write access and didn't want to manually install Python). It does run which looks good, though I get some weird looking error messages, so not sure this is out of the woods yet?

set -e
tmp_dir=$(mktemp)
rm ${tmp_dir}
mkdir -p ${tmp_dir}
cd ${tmp_dir}

echo ${tmp_dir}
git clone https://github.com/immunogenomics/HLA-TAPAS
cd HLA-TAPAS

mkdir -p /well/davies/users/dcc832/bin/python_packages
pip3 install --target=/well/davies/users/dcc832/bin/python_packages/ pandas
export PYTHONPATH=/well/davies/users/dcc832/bin/python_packages/
python -c "import pandas; print(pandas.__version__)"
## note this now gives version 0.24.1

python HLA-TAPAS.py \
    --target example/Case+Control.300+300.chr6.hg18 \
    --reference example/1000G.EUR.chr6.hg18.28mb-35mb \
    --hped-Ggroup example/1000G.EUR.Ggroup.hped \
    --pheno example/Case+Control.300+300.phe \
    --hg 18 \
    --out MyHLA-TAPAS/Case+Control+1000G_EUR_REF \
   --mem 16g \
   --nthreads 4

/tmp/tmp.7ZYh27j1KG
Cloning into 'HLA-TAPAS'...
Checking out files:  26% (67/250)   
[redacted many checking out files lines] 
Checking out files: 100% (250/250), done.
Collecting pandas
  Using cached https://files.pythonhosted.org/packages/74/24/0cdbf8907e1e3bc5a8da03345c23cbed7044330bb8f73bb12e711a640a00/pandas-0.24.2-cp35-cp35m-manylinux1_x86_64.whl
Collecting numpy>=1.12.0 (from pandas)
  Using cached https://files.pythonhosted.org/packages/b5/36/88723426b4ff576809fec7d73594fe17a35c27f8d01f93637637a29ae25b/numpy-1.18.5-cp35-cp35m-manylinux1_x86_64.whl
Collecting pytz>=2011k (from pandas)
  Using cached https://files.pythonhosted.org/packages/4f/a4/879454d49688e2fad93e59d7d4efda580b783c745fd2ec2a3adf87b0808d/pytz-2020.1-py2.py3-none-any.whl
Collecting python-dateutil>=2.5.0 (from pandas)
  Using cached https://files.pythonhosted.org/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl
Collecting six>=1.5 (from python-dateutil>=2.5.0->pandas)
  Using cached https://files.pythonhosted.org/packages/ee/ff/48bde5c0f013094d729fe4b0316ba2a24774b3ff1c52d924a8a4cb04078a/six-1.15.0-py2.py3-none-any.whl
Installing collected packages: numpy, pytz, six, python-dateutil, pandas
Successfully installed numpy-1.18.5 pandas-0.24.2 python-dateutil-2.8.1 pytz-2020.1 six-1.15.0
Target directory /well/davies/users/dcc832/bin/python_packages/numpy already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/numpy-1.18.5.dist-info already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/numpy.libs already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/pytz already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/pytz-2020.1.dist-info already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/six.py already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/six-1.15.0.dist-info already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/__pycache__ already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/dateutil already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/python_dateutil-2.8.1.dist-info already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/pandas already exists. Specify --upgrade to force replacement.
Target directory /well/davies/users/dcc832/bin/python_packages/pandas-0.24.2.dist-info already exists. Specify --upgrade to force replacement.
You are using pip version 9.0.1, however version 20.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
0.24.2
Warning: Variants 'HLA_A*01:01:01G' and 'HLA_A*01' have the same position.
Warning: Variants 'HLA_A*02' and 'HLA_A*01:01:01G' have the same position.
Warning: Variants 'HLA_A*02:01:01G' and 'HLA_A*02' have the same position.
5296 more same-position warnings: see log file.
Namespace(aa_only=False, condition=None, condition_gene=None, condition_list=None, covar=None, covar_name=None, dependency='dependency/', exclude_composites=False, exhaustive=False, exhaustive_aa_pos=None, exhaustive_max_aa=2, exhaustive_min_aa=2, exhaustive_no_filter=False, hg='18', hped='example/1000G.EUR.Ggroup.hped', maf_threshold=0.005, mem='16g', min_haplo_count=10, niterations=5, nthreads=4, out='MyHLA-TAPAS/Case+Control+1000G_EUR_REF', output_composites=False, pcs=None, pheno='example/Case+Control.300+300.phe', pheno_name=None, pop=None, reference='example/1000G.EUR.chr6.hg18.28mb-35mb', reference_bim=None, remove_samples_aa_pattern=None, remove_samples_by_haplo=False, save_intermediates=False, sex=None, target='example/Case+Control.300+300.chr6.hg18', tolerated_diff=0.15)

[HLA-TAPAS.py]: Generating G-group CHPED with given HPED('example/1000G.EUR.Ggroup.hped').
[NomenCleaner.py]: Generating CHPED with G code HLA alleles.
[HLA-TAPAS.py]: Generated CHPED: 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.Ggroup.chped'.

[HLA-TAPAS.py]: Generating Reference panel.
[MakeReference_v2.py]: Making Reference Panel for "MyHLA-TAPAS/Case+Control+1000G_EUR_REF.REF.bglv4"
[1] Generating Amino acid(AA)sequences from HLA types.
[2] Encoding Amino acids positions.
[3] Encoding HLA alleles.
[4] Generating DNA(SNPS) sequences from HLA types.
[5] Encoding SNP positions.
[6] Extracting founders.
[7] Merging SNP, HLA, and amino acid datasets.
[8] Performing quality control.
[9] Preparing files for Beagle.
[10] Converting PLINK to BEAGLE format.
[11] Converting BEAGLE to VCF format.
[12] Phasing reference using Beagle4.1.
[13] Making reference panel for HLA-AA,SNPS,HLA and Normal variants(SNPs) is Done!
[HLA-TAPAS.py]: Generated Reference panel : 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.REF.bglv4'.

[HLA-TAPAS.py]: Performing SNP2HLA imputation.
SNP2HLA: Performing HLA imputation for dataset example/Case+Control.300+300.chr6.hg18
- Java memory = 16gb
[1] Extracting SNPs from the MHC.
[2] Performing SNP quality control.
[3] Converting PLINK to BEAGLE format.
[4] Converting BEAGLE to VCF format.
[5] Performing HLA imputation.
[HLA-TAPAS.py]: Imputed result : 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.bgl.phased.vcf.gz'

[HLAassoc.py::WARNING]: Using phenotype column 'RA' in 'example/Case+Control.300+300.phe' file.
[HLAassoc.py]: Performing Logistic Regression.
[HLA-TAPAS.py]: Output Logistic Regression result : 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.assoc.logistic'.

[HLAassoc.py]: Phased BEAGLE file will be generated from given VCF file('MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.bgl.phased.vcf.gz').
[HLAassoc.py]: Top 10 PCs will be generated from given VCF file('MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.bgl.phased.vcf.gz').
[HLAassoc.py::WARNING]: All samples will be assumed to be originated from same population.
[HLAassoc.py::WARNING]: Sex information in given fam file('example/Case+Control.300+300.chr6.hg18.fam') will be used.
[HLAassoc.py]: Performing Omnibus Test.

[HLAassoc.py::ERROR]: Omnibus Test failed. See the log file('MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.OMNIBUS.OMlog').
[HLA-TAPAS.py]: Output Omnibus Test result : 'None'.

[HLA-TAPAS.py]: Plotting Manhattan Plot.
[HLA-TAPAS.py]: Manhattan plot result : 'MyHLA-TAPAS/Case+Control+1000G_EUR_REF.IMPUTED.assoc.logistic.manhattan.pdf'.

[HLA-TAPAS.py]: HLA-TAPAS done.

immunogenomics / HLA-TAPAS

Example data error #5