refgenie / refgenconf

A Python object for standardized reference genome assets.
http://refgenie.databio.org
BSD 2-Clause "Simplified" License
3 stars 6 forks source link

Memory use of genome digest init #105

Closed nsheff closed 3 years ago

nsheff commented 4 years ago

with the new genome digest calculation during initialization, we're now using henge in the back-end. Unfortunatley when I ran this on some real data, soemf asta files (I presume the larger ones) are getting killed due to memory overages:

grep error ../genomes/submission/*.log
../genomes/submission/refgenie_Zea_mays_PH207_v1_0-fasta.log:slurmstepd: error: Detected 1 oom-kill event(s) in step 14541017.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
../genomes/submission/refgenie_Zea_mays_PH207_v1_0-fasta.log:srun: error: udc-ba27-11: task 0: Out Of Memory
../genomes/submission/refgenie_Zea_mays_PH207_v1_0-fasta.log:slurmstepd: error: Detected 1 oom-kill event(s) in step 14541017.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
../genomes/submission/refgenie_Ziziphus_jujuba__RefSeq_v1_1-fasta.log:yacman.exceptions.AliasError: Provided function '<lambda>' errored: KeyError('aliases',)
../genomes/submission/refgenie_Zostera_marina_JGI_v2_2-fasta.log:yacman.exceptions.AliasError: Provided function '<lambda>' errored: KeyError('aliases',)
../genomes/submission/refgenie_Zoysia_japonica_ssp__nagirizaki_Zoysia_Genome_Database_v1_1-fasta.log:yacman.exceptions.AliasError: Provided function '<lambda>' errored: KeyError('aliases',)

presumably, we could compute these digests without loading the whole fasta file into memory, or somehow otherwise make sure the memory use is more reasonable.

nsheff commented 4 years ago

I've now tried these with 18 GB of memory and they are still getting out of memory errors (for about 3Gb genomes). So, this is a bigger problem than I thought; we really have to revise this approach to computing checksums.

stolarczyk commented 4 years ago

alright, I've spent quite some time investigating and I think I've got it...

Performance comparison

from refgenconf.seqcol import SeqColClient
ssc = SeqColClient({})
c, a = ssc.load_fasta(fa_file="hg38.fa.gz", gzipped=True) 

I used a gzipped hg38 FASTA (~3GB) to compare the old and new approach

old:

fixed:

Fixes

There were two issues that turned out to be completely unrelated:

  1. Long execution time was caused by the pyfaidx.Fasta() call where we read the file in and then iterate through the sequences again to process them before inserting to the Henge database. Fixed in a670e03, where I just decided to read the FASTA file line by line and process it on the fly, then insert to the Henge database, just as before.

  2. Large memory usage was caused by lines like this (!!!!!):

https://github.com/refgenie/refgenconf/blob/d7db570ba245ee9217629d877e496f10a6dae45f/refgenconf/henge.py#L185

Python allocated huge chunk of memory for every item it was supposed to print, even if run in non debug mode... Removing debug lines solved the memory issue.

nsheff commented 4 years ago

wow, that is appalling. nice sleuthing! 2 questions:

  1. does this eliminate the need to have pyfaidx in the requirements?
  2. here we have the issue with embedding seqcol/henge here -- what's your plan for maintaining these? Just keep the development effort here for now and then eventually paste it over those other repos?
stolarczyk commented 4 years ago
  1. yes, the old fasta parser is not required anymore, so pyfaidx requirement is gone
  2. I didn't think about it, but this seems to be the best option for now. Alternatively, we could just update henge and seqcol every time we release stable version of refgenconf.
nsheff commented 4 years ago

and the new approach handles gzipped fasta files (with out .gz extensions) correctly, right?

https://github.com/refgenie/refgenconf/pull/102/commits/d198ea5e31c6f7f9cb33ddd9a417ad6bfb33861d

stolarczyk commented 4 years ago

yes, as long as you determine whether it is gzipped or not gzipped file upfront.

stolarczyk commented 4 years ago

the signature and returned value did not change, the changes are internal.

nsheff commented 4 years ago

This has improved things, but I think there may be something wrong in your calculation, as my jobs are being given 6GB of memory but they are getting killed by slurm with out-of-memory errors on genomes that useonly 3Gb of disk space, even smaller than the human genome

See this one for example:

image

zipped fastq size: 2.8Gb
time: 3m +
memory: ~10.7GB

the file I used is this one: /project/shefflab/www/refgenie_plantref/Triticum_turgidum_Svevo_v1_0-fasta-fasta

nsheff commented 4 years ago

Interesting -- I got similar numbers to you on the hg38 genome, though:

image

it only used about 1/4 of the memory. Must have something to do with size of the chromosomes? That plant genome has only 14 chromosomes

stolarczyk commented 4 years ago

what is the size of that plant genome unzipped? hg38 is 2.9GB unzipped and 833.4MB zipped. Maybe the plant one unzipped is around 10GB and the memory scales with the file size as expected?

edit: just unzipped it and it's 9.5GB, so it makes sense

nsheff commented 4 years ago

Ah, that makes sense. Ok.

Interesting -- when you run refgenie, the memory use continues to increase, it goes past 12Gb. So some other stuff after that function call consumes more memory. Is there any way we can garbage collect or something?

image

It seems like it should be possible to compute the genome checksum without loading the whole thing into memory. We only need to consider one chromosome at a time.

nsheff commented 4 years ago

The command I'm running is this one:

bulker run databio/refgenie:0.7.6 refgenie build Triticum_aestivum_Ensembl_Plants_TGACv1/fasta --genome-description 'Triticum aestivum Ensembl Plants TGACv1' --files fasta=/project/shefflab/www/refgenie_plantref/Triticum_aestivum_Ensembl_Plants_TGACv1-fasta-fasta

it took more than 16 minutes, and more than 12Gb of memory to initialize the genome here. So, most of the time, and the max memory mark, are actually outside of that function all.

So,

ssc.load_fasta:

entire initialize process:

image

stolarczyk commented 4 years ago

are you sure you're referring to the same fasta file in both cases?

I tried building fasta asset for Triticum_aestivum_Ensembl_Plants_TGACv1 and then running just the load_fasta method in iPython and got exactly the same results, close to 17mins and 13GB RAM usage. Here is a screenshot from the load_fasta method call:

image

nsheff commented 4 years ago

you are correct. I was mistaken. Thanks. This explains the discrepancy.

But this point still stands: It seems like it should be possible to compute the genome checksum without loading the whole thing into memory. We only need to consider one chromosome at a time.

It's unfortunate that the initialization procedure takes upwards of 20 minutes and 14 GB of RAM for a real-world use case genome. The actual FASTA asset building step takes only 3 or 4 seconds and no RAM :). Not sure we can solve this completely, but it's worth thinking about.

stolarczyk commented 4 years ago

It seems like it should be possible to compute the genome checksum without loading the whole thing into memory. We only need to consider one chromosome at a time.

right, for this particular use case (to determine a digest and create annotated sequence digest list that we use for genome compatibility checks) it is possible to split it by chromosome and spare memory. In general, if you wanted to store the sequences for a downstream retrieval the RAM use has to be >FASTA file size, unless you back the object with a DB.

Soo.. with the recent commit the algorithm now computes the digest prior to inserting the data to the Henge and indeed it reduces RAM use, which is now a function of single sequence size, not FASTA file size.

As for time, I don't think there's anything we can do. The calculations that need to happen are pretty much the same in case you consider the FASTA file as a whole or split it by sequence.

Results

For hg38 (3GB):

stolarczyk commented 4 years ago

The performance boost is even more apparent in case of Triticum_aestivum_Ensembl_Plants_TGACv1-fasta-fasta file, RAM < 1GB:

    Command being timed: "python test.py"
    User time (seconds): 591.65
    System time (seconds): 3.86
    Percent of CPU this job got: 98%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 10:02.98
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 874760
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 665638
    Voluntary context switches: 2846
    Involuntary context switches: 9604
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
nsheff commented 4 years ago

yeah I noticed -- this appears to have made things much more efficient; at least for this use case. you're right that we can't use this approach if we're actually interested in storing the sequences in the back-end -- but in that case we should still be able to insert them one at a time.

But I'm getting a bunch of new issues now with file locks; not sure if this is unrelated:

I'm seeing this in dozens of results now:

cat ../genomes/submission/refgenie_Zoysia_japonica_ssp__nagirizaki_Zoysia_Genome_Database_v1_1-fasta.log Compute node: udc-aj37-15c1 Start time: 2020-08-27 10:15:42 The lock has been created in the split second since the last lock existence check. Waiting Bulker config: /project/shefflab/rivanna_config/bulker.yaml Activating crate: databio/refgenie:0.7.6

Using 'default' as the default tag for 'Zoysia_japonica_ssp__nagirizaki_Zoysia_Genome_Database_v1_1/fasta'
Recipe validated successfully against a schema: /home/ns5bc/.local/lib/python3.6/site-packages/refgenie/schemas/recipe_schema.yaml
Building 'Zoysia_japonica_ssp__nagirizaki_Zoysia_Genome_Database_v1_1/fasta:default' using 'fasta' recipe
Initializing genome: Zoysia_japonica_ssp__nagirizaki_Zoysia_Genome_Database_v1_1
Loaded AnnotatedSequenceDigestList (11786 sequences)
Traceback (most recent call last):
  File "/home/ns5bc/.local/bin/refgenie", line 10, in <module>
    sys.exit(main())
  File "/home/ns5bc/.local/lib/python3.6/site-packages/refgenie/refgenie.py", line 713, in main
    refgenie_build(gencfg, asset_list[0]["genome"], asset_list, recipe_name, args)
  File "/home/ns5bc/.local/lib/python3.6/site-packages/refgenie/refgenie.py", line 543, in refgenie_build
    alias=ori_genome)
  File "/home/ns5bc/.local/lib/python3.6/site-packages/refgenconf/refgenconf.py", line 1348, in initialize_genome
    with self as rgc:
  File "/home/ns5bc/.local/lib/python3.6/site-packages/yacman/yacman.py", line 110, in __enter__
    self.make_writable()
  File "/home/ns5bc/.local/lib/python3.6/site-packages/yacman/yacman.py", line 216, in make_writable
    create_lock(filepath, getattr(self, WAIT_MAX_KEY, DEFAULT_WAIT_TIME))
  File "/home/ns5bc/.local/lib/python3.6/site-packages/ubiquerg/files.py", line 240, in create_lock
    wait_for_lock(lock_path, wait_max)
  File "/home/ns5bc/.local/lib/python3.6/site-packages/ubiquerg/files.py", line 156, in wait_for_lock
    "file still exists.".format(wait_max))
RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
Waiting for file lock: /project/shefflab/rivanna_config/lock.bulker.yaml .Waiting for file lock: /project/shefflab/deploy/plantref/plantref/config/lock.refgenie_config.yaml ................................................................... File unlocked

is it possible that this system is locking the config file while computing the checksum?

nsheff commented 4 years ago

89 out of 156 runs failed due to this issue:

grep maximum ../genomes/submission/*.log
../genomes/submission/refgenie_Amborella_trichopoda_Amborella_V1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reac
hed and the lock file still exists.
../genomes/submission/refgenie_Amborella_trichopoda__JGI_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached a
nd the lock file still exists.
../genomes/submission/refgenie_Ananas_comosus_JGI_v3_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the 
lock file still exists.
../genomes/submission/refgenie_Apostasia_shenzhenica_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and t
he lock file still exists.
../genomes/submission/refgenie_Arabidopsis_lyrata_JGI_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and 
the lock file still exists.
../genomes/submission/refgenie_Asterochloris_sp__Cgr_DA1pho_v2_0_JGI_7_45_13-fasta.log:RuntimeError: The maximum wait time (60) has
 been reached and the lock file still exists.
../genomes/submission/refgenie_Bathycoccus_prasinos_Ghent_University-fasta.log:RuntimeError: The maximum wait time (60) has been re
ached and the lock file still exists.
../genomes/submission/refgenie_Brassica_oleracea__Ensembl_Genomes_v2_1-fasta.log:RuntimeError: The maximum wait time (60) has been 
reached and the lock file still exists.
../genomes/submission/refgenie_Carica_papaya_Hawaii_Agriculture_Research_Center-fasta.log:RuntimeError: The maximum wait time (60) 
has been reached and the lock file still exists.
../genomes/submission/refgenie_Cenchrus_americanus_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the
 lock file still exists.
../genomes/submission/refgenie_Chenopodium_quinoa__JGI_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and
 the lock file still exists.
../genomes/submission/refgenie_Citrus_clementina__JGI_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and 
the lock file still exists.
../genomes/submission/refgenie_Citrus_sinensis_JGI_v1_assembly_and_v1_0_annotation-fasta.log:RuntimeError: The maximum wait time (6
0) has been reached and the lock file still exists.
../genomes/submission/refgenie_Coccomyxa_subellipsoidea_C-169_JGI_2_0_Phytozome_10_2-fasta.log:RuntimeError: The maximum wait time 
(60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Cucumis_melo__Melonomics_v3_5-fasta.log:RuntimeError: The maximum wait time (60) has been reached an
d the lock file still exists.
../genomes/submission/refgenie_Cucumis_sativus_L___JGI_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and
 the lock file still exists.
../genomes/submission/refgenie_Dictyostelium_discoideum__ENSEMBL_protist_release_28-fasta.log:RuntimeError: The maximum wait time (
60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Ectocarpus_siliculosus_Ghent_University-fasta.log:RuntimeError: The maximum wait time (60) has been 
reached and the lock file still exists.
../genomes/submission/refgenie_Elaeis_guineensis_EG5_1-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the 
lock file still exists.
../genomes/submission/refgenie_Emiliania_huxleyi__ENSEMBL_protist_release_28-fasta.log:RuntimeError: The maximum wait time (60) has
 been reached and the lock file still exists.
../genomes/submission/refgenie_Glycine_max_JGI_Glyma1_1_annotation_of_the_chromosome-based_Glyma1_assembly-fasta.log:RuntimeError: 
The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Glycine_max__JGI_Wm82_a2_v1-fasta.log:RuntimeError: The maximum wait time (60) has been reached and 
the lock file still exists.
../genomes/submission/refgenie_Gossypium_raimondii_JGI_annotation_v2_1_on_assembly_v2_0-fasta.log:RuntimeError: The maximum wait ti
me (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Helicosporidium_sp___Illinois_University_1_0-fasta.log:RuntimeError: The maximum wait time (60) has 
been reached and the lock file still exists.
../genomes/submission/refgenie_Hevea_brasiliensis__NCBI_Genomes_v1-fasta.log:RuntimeError: The maximum wait time (60) has been reac
hed and the lock file still exists.
../genomes/submission/refgenie_Homo_sapiens_ENSEMBL_release_81-fasta.log:RuntimeError: The maximum wait time (60) has been reached 
and the lock file still exists.
../genomes/submission/refgenie_Hordeum_vulgare_Ensembl_Genomes_ASM32608v1-fasta.log:RuntimeError: The maximum wait time (60) has be
en reached and the lock file still exists.
grep maximum ../genomes/submission/*.log | wc -l
89

So, there must be something wrong with when the config file is getting locked

stolarczyk commented 4 years ago

that's odd, I'm 100% sure nothing has changed in that regard. Maybe failing jobs left the lock in place and that's the issue.I noticed there are a couple of logs with errors other than the RuntimeError, did you look into these? For example: /project/shefflab/deploy/plantref/genomes/submission/refgenie_Arabidopsis_lyrata__JGI_v2_1-fasta.log

There has been a FileExistsError raised, that I need to resolve anyway, even if it's not related.

nsheff commented 4 years ago

I bet you're right that it's failing jobs that left the lock in place or something. that's a problem.

nsheff commented 4 years ago

I don't know what is happening, but on a clean slate, I get the exact same set of errors:

grep maximum ../genomes/submission/*.log                            
../genomes/submission/refgenie_Brassica_oleracea__Ensembl_Genomes_v2_1-fasta.log:RuntimeError: The maximum wait time (60) has been 
reached and the lock file still exists.                                                                                            
../genomes/submission/refgenie_Caenorhabditis_elegans_ENSEMBL_release_81-fasta.log:RuntimeError: The maximum wait time (60) has bee
n reached and the lock file still exists.                                                                                          
../genomes/submission/refgenie_Capsella_rubella_JGI_annotation_v1_0_on_assembly_v1-fasta.log:RuntimeError: The maximum wait time (6
0) has been reached and the lock file still exists.                                                                                
../genomes/submission/refgenie_Carica_papaya__JGI_ASGPB0_4-fasta.log:RuntimeError: The maximum wait time (60) has been reached and 
the lock file still exists.                                                                                                        
../genomes/submission/refgenie_Chenopodium_quinoa__JGI_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and
 the lock file still exists.                                                                                                       
../genomes/submission/refgenie_Chlamydomonas_reinhardtii_JGI_5_5_Phytozome_10_2-fasta.log:RuntimeError: The maximum wait time (60) 
has been reached and the lock file still exists.                                                                                   
../genomes/submission/refgenie_Chlamydomonas_reinhardtii_JGI_v5_0_assembly_annot_v5_3_1_based_on_Augustus_u11_6-fasta.log:RuntimeEr
ror: The maximum wait time (60) has been reached and the lock file still exists.                                                   
../genomes/submission/refgenie_Chlamydomonas_reinhardtii__JGI_v5_5-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.

Something is locking the file for more than 60 seconds. It's not leftover from previous runs. The file is eventually getting unlocked.

nsheff commented 4 years ago

I can confirm that in my test, the config file is getting locked during the entire build process.

stolarczyk commented 4 years ago

I narrowed it down to fasta asset builds. So it needs to be a side effect of some recent change in genome initialization

stolarczyk commented 4 years ago

ok, I believe it's fixed. You can give it a try

nsheff commented 4 years ago

the issue persists for me:

grep maximum ../genomes/submission/*.log
../genomes/submission/refgenie_Cajanus_cajan__Penguin_Genomics_v5_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Capsella_rubella_JGI_annotation_v1_0_on_assembly_v1-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Chenopodium_quinoa__JGI_v1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Chlamydomonas_reinhardtii_JGI_5_5_Phytozome_10_2-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Chlamydomonas_reinhardtii__JGI_v5_5-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Chlorella_sp_NC64A_JGI_1_0-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Chondrus_crispus_ENSEMBL_protists__release_28-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Citrullus_lanatus_Cucurbit_Genomics_Database-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
../genomes/submission/refgenie_Coffea_canephora__Coffee_Genome_Hub_v1-fasta.log:RuntimeError: The maximum wait time (60) has been reached and the lock file still exists.
nsheff commented 4 years ago

I'm watching the file lock -- it is not locking and unlocking immediately the way it should. whatever is locking it is keeping it locked for seconds, I think sometimes minutes at a time.

nsheff commented 4 years ago

for testing, you can use this to watch the lockfile and see when it gets replaced:

watch -n 0.2 'ls -l --time-style=full-iso config/lock.refgenie_config.yaml'
nsheff commented 4 years ago

Yes, I can confirm, in my hands, the config file is locked during the entire duration of the asset build process. this is a mistake. I am using the latest refgenie and refgenconf.

nsheff commented 4 years ago

could it be because the initialize_genome is not in a context manager, so it's getting locked, but not unlocked after initializing the genome?

nsheff commented 4 years ago

See line: DEBU 16:59:26 | yacman.yacman:yacman:227 > Made object writable

then it goes straight into the build

EBU 16:59:26 | refgenconf.refgenconf:refgenconf:103 > Config version is compliant: 0.4 
**DEBU 16:59:26 | yacman.yacman:yacman:227 > Made object writable** 
DEBU 16:59:26 | refgenconf.refgenconf:refgenconf:2242 > checking existence of: c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/Non
e:None 
DEBU 16:59:26 | attmap.attmap:attmap:105 > Transforming map-like: PathExAttMap: {} 
DEBU 16:59:26 | yacman.alias:alias:228 > Added aliases (c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f: ['Nelumbo_nucifera__LOTUS
-DB_v1_1']) 
INFO 16:59:26 | refgenconf.refgenconf:refgenconf:1274 > Set genome alias (c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f: Nelumbo
_nucifera__LOTUS-DB_v1_1) 
DEBU 16:59:26 | yacman.yacman:yacman:168 > Wrote to a file: /project/shefflab/deploy/plantref/plantref/config/refgenie_config.yaml 
DEBU 16:59:26 | yacman.yacman:yacman:168 > Wrote to a file: /project/shefflab/deploy/plantref/plantref/config/refgenie_config.yaml 
INFO 16:59:26 | root:refgenie:413 > Saving outputs to:
- content: /project/shefflab/deploy/plantref/genomes/data/c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f
- logs: /project/shefflab/deploy/plantref/genomes/data/c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/fasta/default/_refgenie_bui
ld 
DEBU 16:59:26 | logmuse:est:266 > Configured logger 'logmuse' using logmuse v0.2.6 
DEBU 16:59:26 | logmuse:manager:1131 > Logger set with logmuse.logger_via_cli 
Removed existing flag: '/project/shefflab/deploy/plantref/genomes/data/c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/fasta/defau
lt/_refgenie_build/refgenie_failed.flag'
INFO 16:59:26 | logmuse:manager:1134 > ### Pipeline run code and environment:

INFO 16:59:26 | logmuse:manager:1134 > *              Command:  `/home/ns5bc/.local/bin/refgenie --verb 5 build Nelumbo_nucifera__L
OTUS-DB_v1_1/fasta --genome-description Nelumbo nucifera  LOTUS-DB v1.1 --files fasta=/project/shefflab/www/refgenie_plantref/Nelum
bo_nucifera__LOTUS-DB_v1_1-fasta-fasta -R` 
INFO 16:59:26 | logmuse:manager:1134 > *         Compute host:  udc-ba34-36 
INFO 16:59:26 | logmuse:manager:1134 > *          Working dir:  /sfs/qumulo/qproject/shefflab/deploy/plantref/plantref 
INFO 16:59:26 | logmuse:manager:1134 > *            Outfolder:  /project/shefflab/deploy/plantref/genomes/data/c0cc9804e43d0e05b126
6f52a29e701958975ec136ef863f/fasta/default/_refgenie_build/ 
INFO 16:59:26 | logmuse:manager:1134 > *  Pipeline started at:   (08-27 16:59:26) elapsed: 0.0 _TIME_ 
INFO 16:59:26 | logmuse:manager:1134 > 
### Version log:

INFO 16:59:26 | logmuse:manager:1134 > *       Python version:  3.6.6 
INFO 16:59:26 | logmuse:manager:1134 > *          Pypiper dir:  `/sfs/qumulo/qhome/ns5bc/.local/lib/python3.6/site-packages/pypiper
` 
INFO 16:59:26 | logmuse:manager:1134 > *      Pypiper version:  0.12.1 
INFO 16:59:26 | logmuse:manager:1134 > *         Pipeline dir:  `/sfs/qumulo/qhome/ns5bc/.local/bin` 
INFO 16:59:26 | logmuse:manager:1134 > *     Pipeline version:  None 
INFO 16:59:26 | logmuse:manager:1134 > 
### Arguments passed to pipeline:

INFO 16:59:26 | logmuse:manager:1134 > * `asset_registry_paths`:  `['Nelumbo_nucifera__LOTUS-DB_v1_1/fasta']` 
INFO 16:59:26 | logmuse:manager:1134 > *             `assets`:  `None` 
INFO 16:59:26 | logmuse:manager:1134 > *            `command`:  `build` 
INFO 16:59:26 | logmuse:manager:1134 > *        `config_file`:  `/home/ns5bc/.local/lib/python3.6/site-packages/refgenie/refgenie.y
aml` 
INFO 16:59:26 | logmuse:manager:1134 > *             `docker`:  `False` 
INFO 16:59:26 | logmuse:manager:1134 > *              `files`:  `[['fasta=/project/shefflab/www/refgenie_plantref/Nelumbo_nucifera_
_LOTUS-DB_v1_1-fasta-fasta']]` 
INFO 16:59:26 | logmuse:manager:1134 > *             `genome`:  `None` 
INFO 16:59:26 | logmuse:manager:1134 > *      `genome_config`:  `None` 
INFO 16:59:26 | logmuse:manager:1134 > * `genome_description`:  `Nelumbo nucifera  LOTUS-DB v1.1` 
INFO 16:59:26 | logmuse:manager:1134 > *             `logdev`:  `False` 
INFO 16:59:26 | logmuse:manager:1134 > *          `new_start`:  `False` 

after build process we have:

INFO 16:59:45 | root:refgenie:560 > Finished building 'fasta' asset 
DEBU 17:00:00 | root:refgenie:572 > adding genome (c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f) description: 'Nelumbo nucifera  LOTUS-DB v1.1' 
DEBU 17:00:00 | refgenconf.refgenconf:refgenconf:605 > getting asset: 'c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/fasta.None:
default' 
DEBU 17:00:00 | refgenconf.refgenconf:refgenconf:2242 > checking existence of: c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/fas
ta:default 
DEBU 17:00:00 | refgenconf.refgenconf:refgenconf:615 > Trying absolute path: fasta 
DEBU 17:00:00 | refgenconf.refgenconf:refgenconf:2242 > checking existence of: c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/fas
ta:default 
DEBU 17:00:00 | refgenconf.refgenconf:refgenconf:628 > Trying relative to genome_folder/genome/_data (/project/shefflab//deploy/pla
ntref/genomes/c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/data): /project/shefflab/deploy/plantref/genomes/data/c0cc9804e43d0e05b1266f52a29e701958975ec136ef863f/fasta/default 
INFO 17:00:00 | refgenconf.refgenconf:refgenconf:461 > Created alias directories: 
 - /project/shefflab/deploy/plantref/genomes/alias/Nelumbo_nucifera__LOTUS-DB_v1_1/fasta/default 
DEBU 17:00:00 | yacman.yacman:yacman:168 > Wrote to a file: /project/shefflab/deploy/plantref/plantref/config/refgenie_config.yaml 
DEBU 17:00:00 | yacman.yacman:yacman:193 > Made object read-only 
stolarczyk commented 4 years ago

I was able to reproduce this issue previously in my test fasta build locally. Then I "fixed" it here: 3c3738e and it resolved the issue in my test.

With your recent commits you've reverted this "fix", so I guess is it still does not work

nsheff commented 4 years ago

Can you explain what you mean by "you've reverted this fix" ? I did not intend to revert anything and I see no revert commits.

It wasn't working for me with your fix, and it's still not working for me even with my context manager update. none of these things is solving the issue in my test...

nsheff commented 4 years ago

ok, I just checked out your commit and that did work this time. I'm confusing myself.

I have to go now but I'll double-check later. I do not know what I did wrong. Perhaps it depends on if the genome is already initialized or not.

stolarczyk commented 4 years ago

Can you explain what you mean by "you've reverted this fix" ?

I might've used a wrong word here. By revert I meant: you brought that part of code back to the state similar that it was before my "fix" -- the context manager is there again (my "fix" removed it, because the file locking is handled by a context manager in the set_genome_alias method).

nsheff commented 4 years ago

I can't explain what happened, but now with your change it worked. However, watching the lock files, it's clear that something is still locking the file for periods of time. I think it worked one time and didn't work another time may have something to do with random issue on rivanna with file system speed or something.

Anyway, I've reverted my changes (I never seem to be able to figure out how to do that well) -- I just forced update to set back to where you had it, hope it doesn't cause trouble for you on the history rewrite