jharno / cubep3m

cosmological n-body code
https://wiki.cita.utoronto.ca/index.php/CubePM
35 stars 11 forks source link

error in run cubep3m #15

Open wuseyu opened 1 year ago

wuseyu commented 1 year ago

After solving the missing file problem( ../cubep3m.threads.070515/input/checkpoints_high), I encountered some new problems. At first, I met this problem. image By ltrace I found It may be a problem with missing files table_M_Delta.dat, so I added it after that.The log file output after that shows that the file was found, but the problem is still not solved(FFT direction error). Finally,I tried to revise File Run10Codes.pbs because log file show it has mpirun program,maybe it can't find working directory.(I'm not sure if the direction is correct,it's just a try) image Now I got the final log file.c.log May I ask what caused the problem? Looking forward to your reply.

jharno commented 1 year ago

The table you are missing is in source_threads, I think the problem is with the paths you set in your parameter file. Did you source the README file to get a corrected one?

wuseyu commented 1 year ago

Yes,do you mean replace these paths with the correct paths? image

wuseyu commented 1 year ago

How to use these paths in parameters? I mean these paths are absolute paths or relative paths? I don't have a folder named scratch under my folder source_threads. Can you post a sample screenshot of the complete project?

wuseyu commented 1 year ago

image I found a lot of this kind of errors in the log file, does it mean that the first path is set wrong? image

jharno commented 1 year ago

Hi,

These are absolute paths, so you need to change these to the location where you want to read the IC files from (ic_path) and where you want to output the data, projects and halo catalogues. these could all be the same directory, if you want.

Again, if you do

bash README

it will set the paths correctly.

The initial conditions must be run first. This is done with:

Compile:

COMPILE_dist_init.csh

Run:

mpirun -n $MPI_TASKS dist_init_dm

wuseyu commented 1 year ago

Hi, I have executed the commands in the README file once, but still get the same error, I wonder if an input file is required under ic_path. I want to test if this program is working properly,so could you give me a test file to run it? Thanks to your reply.

jharno commented 1 year ago

The initial conditions must be created before running cubep3m. Since these are written as binaries, they will vary with the machine you work on and with the parameter file you provide, soI am afraid you need to generate the IC yourself. Can you try to compile and run dist_init_dm?

wuseyu commented 1 year ago

Because of gcc env, I have to change and add some fflags, the process is the same as I did before,now I I got this error. image I think this value must be defined in another file because it is the only use in dist_init_dm.f90. I found this variable in the file parameter but in one line comment image

jharno commented 1 year ago

I think you need to find out how to use and call random_seed with your compiler.

wuseyu commented 1 year ago

Thanks Reply, I'll try to find out and if there is any progress I'll update.

wuseyu commented 1 year ago

Hello,I'm very sorry to disturb you again. This function usage is not supported in the gcc compiler, so I removed it and no error is reported so far. This compilation generates many files, and one executable "dist_init" is generated under the "/utils/dist_init" folder.I tried to execute it, and then some error occurred. image After this compilation,I can't find some file have ".ic" suffix image

jharno commented 1 year ago

The files '.ic' are the particle data to be written by dist_init when not crashing. I can see from the above screen shot that you have the missing file (camb_WMAP5_transfer_z0.dat), it is in your utils/dist_init/ directory. I think the code wants that file to be in your /batch directory, could you try to copy it there?

wuseyu commented 1 year ago

I have moved the file to the folder "/batch", and then "dist_init" still couldn't find this file. Is there anything wrong? image

wuseyu commented 1 year ago

I tried to run it in the environment, and moved the executable file "dist_init" to the root directory of cubep3m, but still can't find the file.

jharno commented 1 year ago

Can you keep dist_init in its original folder (utils/dist_init/), but copy the camb file there as well?

wuseyu commented 1 year ago

Do you mean to move the file "camb_WMAP5_transfer_z0.dat" to the "utils/dist_init" directory?Actually there is already this file there,I overwrited it and tried it but not work. Then I tried to remove it at "./cubep3m/" "./cubep3m/batch", and compile the file "dist_init" under the folder "./cubep3m/utils/dist_init/",error still occurred.

jharno commented 1 year ago

The file camb_WMAP5_transfer_z0.dat needs to be in the same directory as the executable ./dist_init which you use to run the initial condition code.

wuseyu commented 1 year ago

Yes, this file is generated correctly, and I can see that it was recently updated, but the execution file dist_init doesn't seem to find it.

jharno commented 1 year ago

is the file empty?

wuseyu commented 1 year ago

Sorry, our school does not provide electricity at night, I will check it tomorrow morning.

wuseyu commented 1 year ago

Yes,file camb_WMAP5_transfer_z0.dat is empty,but it did update.

jharno commented 1 year ago

OK, so that means the real file wasn't in the correct directory to start with. Can you make sure you copy the original file to this location and try again (overwrite the empty file)? The code should find it now.

Cheers, Joachim

wuseyu commented 1 year ago

Thanks for your reply, I will try it now.

wuseyu commented 1 year ago

The executable file dist_init was originally in the folder "./cubep3m/utils/dist__init", and file camb_WMAP5_transfer_z0.dat is the file that is generated after executing the executable so I deleted it. Now I tried to implement file "dist_init",file camb_WMAP5_transfer_z0.dat is generated again,but error still exists, and file is still empty. image

jharno commented 1 year ago

You need a non-empty file, get you pull it again from github?

wuseyu commented 1 year ago

image I mean this file is a intermediate generated file, and it is not available on github,why it can be generated normally where it is needed but not read by the executable file "dist_init"?

jharno commented 1 year ago

It is not n intermediate file, it is the transfer function generated by CAMB (a different code) and can be found on git hub in the /batch directory.

wuseyu commented 1 year ago

image File dist_init seems to be working, but when caching Delta on disk it works abnormally.Is this function just part of the software and can't work on its own, and I need to set its path in parameters and then run the cubep3m executable?

wuseyu commented 1 year ago

Now there is only one file ( seed0.init ) in the folder /scratch/${USER}, and this is its specific content. image

wuseyu commented 1 year ago

I tried to run dist_init again, this file content is refreshed, so this file must be built by dist_init. It can be seen that the program wants to find a file named "delta0". At first I thought that the file was placed in the wrong place, it did not output to the folder "/scratch/${USER}" , but when I used the "git status" command, I didn't find this file generated under the entire library folder.

jharno commented 1 year ago

The directory

/scratch/wuseyu/delta0

needs to exists, it is where the files will be written. If it does not exist, then you should change the ic_path in the parameter file to the place where you want to write the data

wuseyu commented 1 year ago

First I tried to create the "delta0" folder under the folder "/scratch/${USER}" and run dist_init, but the result was no different from before. So I modified the "ic_path" in the parameters file to "/scratch/ ${USER}/delta0" and trying to run the cubep3m file still reports "FFT direction error". The folder "delta0" is still empty when these attempts are done

jharno commented 1 year ago

These are absolute paths. Does that directory exist in your machine? Remember that you should not run cubep3m until you have successfully generated your initial conditions with dist_init

jharno commented 1 year ago

Ah, sorry, I skipped over one of your comment, I see now that the directory exists, that you've created the seed file, but that it can't produce the delta function there.

wuseyu commented 1 year ago

Yes, that's what's bothering me right now. Thanks.

wuseyu commented 1 year ago

File seed0.init could find path "/scratch/${USER}" but delta function couldn't.

jharno commented 1 year ago

Found the problem: Can you make sure that in your parameter file our scratch_path is the same as your ic_path?

jharno commented 1 year ago

Then recompile dist_init and run again

wuseyu commented 1 year ago

They are the same, I just replace "$USER" with my username. Then I copy the ic_path in original file but failed image image

jharno commented 1 year ago

OK, the code is trying to write an intermediate file (delta0) in a directory that exists, but complains it can't. That means the problem is likely with the file opening statement. That's in line 791:

open(11,file=fn,form=IOform,iostat=ioerr,status='replace')

How do you compile the code?

wuseyu commented 1 year ago

I just use command "./dist_init" in folder "utils/dist_init/". Should I do more?

jharno commented 1 year ago

did you recompile the code (cd batch/ ; bash COMPILE_dist_init.csh)?

wuseyu commented 1 year ago

Yes, I've done this before, but just to be safe I ran it again in the environment and tried "./dist_init". image

jharno commented 1 year ago

OK, the error comes indeed from the fact that the open statement can't deal with binaries in gfortran (works with intel compilers, though). That appears as a warning when you compile, but causes the crash. You need to remove the

-DBINARY

flag in the COMPILE_dist_init.csh, recompile, then re-run.

wuseyu commented 1 year ago

image OK,now these two warnings have gone away. Maybe the other warning caused the program to crash?

wuseyu commented 1 year ago

Sorry, these days are approaching the end of the term, the tasks of various disciplines are down, so I am a bit busy, I took the time today to review our discussion again. Surprisingly, after I reconfigured the environment and executed "COMPILE_dist_init.csh" and "dist_init" again, a new folder was generated under the "source_threads" folder, and the folder has the ic file we are looking for! image

jharno commented 1 year ago

Oh, that means your paths are ill-defined. You might be missing a trailing '/'

wuseyu commented 1 year ago

I'm sure my path is absolute path, but the problem of ic file is solved.

wuseyu commented 1 year ago

Thanks for your continued attention and help. Now should I now execute the file "cubep3m"? At least parameters file should be updated. That mean I should set ic_path to "/root/hpcrunner_1/tmp/cubep3m.threads.070515/source_threads/scratch"? image Where should the next two addresses be set should not affect the operation of the program? "cubep3m_root" should remain as it is, right?

wuseyu commented 1 year ago

Hello, image This is my last log, The reason for the error "FFT direction error" is still unknown, but now I'm trying to deal with the problem with not being able to find the file "/scratch/$USER/xv0.ic". First I thought its name is the same as the log (in folder "/root/hpcrunner_1/tmp/cubep3m.threads.070515/source_threads/scratch"), so I removed it to folder "/root/hpcrunner_1/tmp/cubep3m.threads.070515/source_threads/scratch/wuseyu",and renamed it to "xv0.ic", unfortunately this change didn't work. Next, I thought it maybe caused by path bacause the configuration of my parameters file looks like this. image I tried to modify "ic_path" to try to find the file.

Relative Path image

Absolute Path image

I found it just changed the file generation location, and did not find the "xv0.ic" file.