thierrygosselin / assigner

Population assignment analysis using R
http://thierrygosselin.github.io/assigner
GNU General Public License v3.0
17 stars 6 forks source link

5 nodes produced errors; first error: error writing to connection #20

Closed jwhitaker17 closed 4 years ago

jwhitaker17 commented 4 years ago

Hi Thierry,

I'm gettting the error message below when attempting to run assignment_ngs in RStudio. I read through the older issues and found one similar, but there was no resolution posted. Based on that issue though, I ran detect Cores() , which shows 8 cores, and then tried running the code with parallel.core = 1. When I run assignment_ngs with more cores, I get more nodes producing more errors (5 vs. 1). Any help would be greatly appreciated. I attached the populations folder, but it would not let me attach the vcf. Thanks in advance!

Best, Justine Whitaker

STL.vcf2=read_vcf('STLpopulations.snps.vcf', strata = 'StLaw_popmapFINAL.txt', parallel.core = 1)

test2 <- assigner::assignment_ngs(data = STL.vcf2, assignment.analysis = "gsi_sim", markers.sampling = "ranked", thl = 0.2, iteration.method = 5, marker.number = c(100, 200, 300, 400, "all"), subsample = 20, iteration.subsample = 3, parallel.core = 1)

assigner::assignment_ngs ############################ ################################################################################ Execution date/time: 20200406@1332 Assignment analysis with gsi_sim Folder created: assignment_analysis_method_ranked_20200406@1332 Computation time, overall: 8 sec Calibrating REF/ALT alleles... number of REF/ALT switch = 21 Subsampling: selected using subsample size of: 20 Analyzing subsample: 1 Conducting Assignment analysis using Training, Holdout, Leave-one-out Using training samples to rank markers based on Fst Holdout samples saved in your folder Starting parallel computations, for progress monitor activity in folder... Error in checkForRemoteErrors(val) : one node produced an error: 'assignment_data_iteration_1_markers_100.output.txt' does not exist in current working directory ('C:/Users/justi/Documents/Fisheries Research Revisions/assignPOP/assignment_analysis_method_ranked_20200406@1332/subsample_1/assignment_1'). In addition: Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading 3: In serialize(data, node$con) : 'package:stats' may not be available when loading 4: In serialize(data, node$con) : 'package:stats' may not be available when loading 5: In serialize(data, node$con) : 'package:stats' may not be available when loading 6: In serialize(data, node$con) : 'package:stats' may not be available when loading Computation time, overall: 118 sec ########################## assignment_ngs completed ############################

 

| >

parallel::detectCores() [1] 8

devtools::session_info() Registered S3 method overwritten by 'cli': method from print.tree tree

  • Session info ------------------------------------------------------------------------------------------------------------------------------ setting value
    version R version 3.6.3 (2020-02-29) os Windows 10 x64
    system x86_64, mingw32
    ui RStudio
    language (EN)
    collate English_United States.1252
    ctype English_United States.1252
    tz America/Chicago
    date 2020-04-06

[1] C:/Users/justi/Documents/R/win-library/3.6 [2] C:/Program Files/R/R-3.6.3/library

[StLaw_popmapFINAL.txt]

(https://github.com/thierrygosselin/assigner/files/4440233/StLaw_popmapFINAL.txt)

thierrygosselin commented 4 years ago

Dear Justine, please read the function documentation carefully, twice.

inside R using assigner::assignment_ngs

You're using the vcf file the wrong way, try following the example by using it directly. Thierry

jwhitaker17 commented 4 years ago

Hi Thierry,

Thanks for getting back to me so quickly. If I'm understanding correctly, you're saying that I need to use the tidy_genomic_data function to convert/import my data. Is that correct? I've attempted to use it, but get the same error. Also, I get the same error when I try to use the example datasets. So, while I think I understand what you were saying about my data file, there seems to be a bigger problem.

I appreciate your help.

Best,

Justine Whitaker, PhD Assistant Professor 906 East 1st Street 229 Gouaux Hall Nicholls State University Thibodaux, LA 70301 985-493-2628

On Mon, Apr 6, 2020 at 4:48 PM Thierry Gosselin notifications@github.com wrote:

Closed #20 https://github.com/thierrygosselin/assigner/issues/20.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thierrygosselin/assigner/issues/20#event-3207356370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMGULSUYKB52254QZEM4Q6TRLJE4HANCNFSM4MCQJARA .

thierrygosselin commented 4 years ago

Could you send me via email the required files to reproduce the error? For me the example works

thierrygosselin commented 4 years ago

I'm not able to reproduce the error you have with your dataset.

Let's go over a couple of steps before thinking about the parallel processing problems of some PCS... (they don't make it easy to work with genomic datasets):

Plan A

gsi_sim -b justine_whitake_gsi_sim_input.txt --self-assign > justine_whitake_gsi_sim_output.txt

if you didn't change the name of the program:

gsi_sim-MINGW32_NT-6.1 -b justine_whitake_gsi_sim_input.txt --self-assign > justine_whitake_gsi_sim_output.txt

The input file was sent by email, to reproduce this is what I did:

require(radiator)
data <- radiator::read_rad(data = "STLtd.rad") #the data you sent
radiator::write_gsi_sim(data = data, filename = "justine_whitake_gsi_sim_input.txt")

If this is working the problem is probably the parallel processing within assigner. If this is not working it's a problem with gsi_sim on your computer.

Plan B

Eric Anderson as also done a tidy version of gsi_sim called rubias.

Plan C Use the Cloud...

Plan D Use the folder I've sent yesterday. I can run something else if required.

Good luck Thierry

jwhitaker17 commented 4 years ago

Hi Thierry,

I reinstalled assigner (per the instructions) and gsi_sim.

assigner::install_gsi_sim()Downloading file https://github.com/eriqande/gsi_sim/blob/080f462c8eff035fa3e9f2fdce26c3ac013e208a/gsi_sim-MINGW32_NT-6.1And copying to C:/Users/justi/Documents/R/win-library/3.6/assigner/bin/gsi_simtrying URL 'https://github.com/eriqande/gsi_sim/blob/080f462c8eff035fa3e9f2fdce26c3ac013e208a/gsi_sim-MINGW32_NT-6.1'Content type 'text/html; charset=utf-8' length unknowndownloaded 63 KB NULL

I tried running the practice data and got the same error.

When I tried to install gsi_sim separately, I couldn't compile it. I got the following error.

MINGW64 ~/gsi_sim (master) $ ./Compile_gsi_sim.sh Compiling up executable gsi_sim-MINGW64_NT-10.0-18362

./Compile_gsi_sim.sh: line 23: gcc: command not found

Best,


Justine Whitaker, PhD Assistant Professor 906 East 1st Street 229 Gouaux Hall Nicholls State University Thibodaux, LA 70301 985-493-2628

On Thu, Apr 9, 2020 at 8:54 AM Thierry Gosselin notifications@github.com wrote:

I'm not able to reproduce the error you have with your dataset.

Let's go over a couple of steps before thinking about the parallel processing problems of some PCS... (they don't make it easy to work with genomic datasets):

Plan A

-

make sure that it's not an install problem: reinstall assigner http://thierrygosselin.github.io/assigner/#installation and make sure that gsi_sim is installed properly using: assigner::install_gsi_sim()

if this doesn't fix the problem, try installing gsi_sim separately. Instructions here https://github.com/eriqande/gsi_sim. And then run this script:

gsi_sim -b justine_whitake_gsi_sim_input.txt --self-assign > justine_whitake_gsi_sim_output.txt

if you didn't change the name of the program:

gsi_sim-MINGW32_NT-6.1 -b justine_whitake_gsi_sim_input.txt --self-assign > justine_whitake_gsi_sim_output.txt

The input file was sent by email, to reproduce this is what I did:

require(radiator) data <- radiator::read_rad(data = "STLtd.rad") #the data you sent radiator::write_gsi_sim(data = data, filename = "justine_whitake_gsi_sim_input.txt")

If this is working the problem is probably the parallel processing within assigner. If this is not working it's a problem with gsi_sim on your computer.

Plan B

Eric Anderson as also done a tidy version of gsi_sim called rubias https://github.com/eriqande/rubias.

Plan C Use the Cloud https://thierrygosselin.github.io/RADseq_cloud_tutorial/ ...

Plan D Use the folder I've sent yesterday. I can run something else if required.

Good luck Thierry

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thierrygosselin/assigner/issues/20#issuecomment-611540975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMGULSVMY54ODQEV6T56KVTRLXHTFANCNFSM4MCQJARA .

thierrygosselin commented 4 years ago

Ok so definitely something with your computer and sadly I don't offer PC troubleshooting for 2 reasons:

  1. life is too short ;)
  2. I don't have a PC, so usually if I really need one, I start an Amazon instance.

Go for plan B, C or D above Best, Thierry

jwhitaker17 commented 4 years ago

Got it. Thanks for all your help!

Best,

Justine Whitaker, PhD Assistant Professor 906 East 1st Street 229 Gouaux Hall Nicholls State University Thibodaux, LA 70301 985-493-2628

On Thu, Apr 9, 2020 at 2:56 PM Thierry Gosselin notifications@github.com wrote:

Ok so definitely something with your computer and sadly I don't offer PC troubleshooting for 2 reasons:

  1. life is too short ;)
  2. I don't have a PC, so usually if I really need one, I start an Amazon instance.

Go for plan B, C or D above Best, Thierry

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thierrygosselin/assigner/issues/20#issuecomment-611723414, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMGULST2KTWFZ7PV4POMM6DRLYR6NANCNFSM4MCQJARA .