training reproduce error

jzhoulab / orca

sequence-based prediction of multiscale genome structure from kilobase to whole-chromosome scale

Other

70 stars 21 forks source link

training reproduce error #10

Open mzc2113391 opened 5 months ago

mzc2113391 commented 5 months ago

Hello, I tried to reproduce the training of orca but there are some bugs

first when I run python misc/make_genome_memmap.py, I get this error: AttributeError: 'MemmapGenome' object has no attribute 'initialized'. When I manually add self.initialized to the MemmapGenome class, this step works.

But when I run python train_h1esc_a.py, I get a new error: File "/backup1/orca/orca-main/train/.. /selene_utils2.py", line 1135, in sample targets = np.zeros((batch_size, *self.target.shape)) AttributeError: 'GenomicFeatures' object has no attribute 'shape'

Can you give me some help？thanks

jzthree commented 5 months ago

You should install Selene with the provided commands

git clone https://github.com/kathyxchen/selene.git
cd selene
git checkout custom_target_support
python setup.py build_ext --inplace
python setup.py install

mzc2113391 commented 5 months ago

Thanks a lot! I found that in SamplerDataLoader step, the memory usage was too high and my 128 GB memory would crash. How much memory does orca need? Is there a better way to allocate memory? Can I reduce the batch size to reduce memory usage? Though I think reducing the batch size may not be a good choice😭😭

jzthree commented 5 months ago

There might be other places but I think the main thing is that you should reduce num_workers in data_loader.

jimmylihui commented 4 months ago

Thanks a lot! I found that in SamplerDataLoader step, the memory usage was too high and my 128 GB memory would crash. How much memory does orca need? Is there a better way to allocate memory? Can I reduce the batch size to reduce memory usage? Though I think reducing the batch size may not be a good choice😭😭

I think you didn't run the make_genome_memmap.py