jzhoulab / orca

sequence-based prediction of multiscale genome structure from kilobase to whole-chromosome scale
Other
70 stars 21 forks source link

Installation guide in readme, memory map takes a while message #5

Closed gr-grey closed 10 months ago

gr-grey commented 10 months ago
  1. Revision of Orca installation instruction. orca_env 3 steps setup: python 3.9 (gcc update, cooltools) -> pytorch -> selene Files involved: README.md modified, orca_env_part1.yml added, orca_env_part2.yml added, orca_env.yml deleted.

  2. Memmap creation message. Print out a message to remind people not to kill the program during memory map creation, which leads to incorrect mmap file and wrong prediction down the road. File involved: selene_utils2.py

Test the message printed from class MemmapGenome(Genome) in selene_utils2.py

# test writing the memory map file 
reference_sequence=MemmapGenome(
    input_path="./chr9_94904000_126904000.fa", # custom fasta file of chr9:94904000:126904000
    memmapfile="./chr9_94904000_126904000.fa.mmap",
    blacklist_regions="hg38"
)

# get_encoding... returns one hot embedded genome, shape (32Mb, 4)
chrom = "Sequence_0"
embedded = reference_sequence.get_encoding_from_coords(chrom, 0, 32000000)

Result:

Creating memmap...
This may take a while (e.g. ~hours for human genome).
If the process is interrupted or killed, the .mmap file will be incorrect,
in which case, delete the mmap file and try again.
jzthree commented 10 months ago

Thanks looks good! Can you edit your pull request message to add how you have tested the code (I know you have tested it)?