petrelharp / local_pca

Methods for examining PCA locally along the genome.
71 stars 13 forks source link

Error in cmdscale(pc.distmat[!na.inds, !na.inds], k = opt$nmds) : NA values not allowed in 'd' #33

Closed ltalignani closed 11 months ago

ltalignani commented 11 months ago

Dear Peter,

Sorry to bother you again. I restarted the analysis on a slurm cluster with a bash script like this:

#SBATCH -t 3-23:59:59
# Define partition
#SBATCH --partition=long
# Set number of nodes to run
#SBATCH --nodes=1
# Set number of cpus
#SBATCH -c 16
# Set memory
#SBATCH --mem=128G
# Define email for script execution
#SBATCH --mail-user=loic.talignani@ird.fr
# Define type notifications
#SBATCH --mail-type=ALL
###################################################################

echo "Load module"
module purge
module load r/4.3.1
module load bcftools/1.15.1

echo "run Local PCA for 5kb windows"
Rscript --vanilla run_lostruct.R -i data -t bp -s 5000 -I data/sample_info.tsv > lostruct-${SLURM_JOB_ID}.Rout 2>&1

As you can see, I used -t bp and -s 5000options. I haven't had the error described below before using the -t snp -s 1000 options. The run_lostruct.R script remains unchanged.

After 15 hours, the job stops with the error message:

Error in cmdscale(pc.distmat[!na.inds, !na.inds], k = opt$nmds) : 
  NA values not allowed in 'd'
Calls: cbind -> cbind -> data.frame -> cmdscale
Execution halted

It seems to me, however, that the NAs are managed by the script and that the MDS calculation is performed without them, no? Where do you think the problem comes from?

Here's a link to download the *.pca.csv and regions.csv files, as well as the config.json file: https://filesender.renater.fr/?s=download&token=d83c40ce-8178-4fff-9d89-9cde1b3d3b2a

Thanks in advance for your help.

Best regards.

ltalignani commented 11 months ago

OK, I just changed that line: na.inds <- is.na(all.pcas[,1] ) into na.inds is.na(pc.distmat[,1] ) and it worked.

Regards,