jharno / cubep3m

cosmological n-body code
https://wiki.cita.utoronto.ca/index.php/CubePM
35 stars 11 forks source link

error in run cubep3m #15

Open wuseyu opened 1 year ago

wuseyu commented 1 year ago

After solving the missing file problem( ../cubep3m.threads.070515/input/checkpoints_high), I encountered some new problems. At first, I met this problem. image By ltrace I found It may be a problem with missing files table_M_Delta.dat, so I added it after that.The log file output after that shows that the file was found, but the problem is still not solved(FFT direction error). Finally,I tried to revise File Run10Codes.pbs because log file show it has mpirun program,maybe it can't find working directory.(I'm not sure if the direction is correct,it's just a try) image Now I got the final log file.c.log May I ask what caused the problem? Looking forward to your reply.

jharno commented 1 year ago

You need to include a '/' at the end of your paths:

/scratch/wuseyu/

and not

/scratch/wuseyu

wuseyu commented 1 year ago

Thanks for reply. Now I tried to rum cubep3m, but its not going well. This is my parameters file. image (Should I pin "ic_path" to the ic file?) Is there anything else I need to do before running the software, maybe I'm missing something? image

jharno commented 1 year ago

The IC path should be the absolute path the xv0.ic file


From: wuseyu @.> Sent: 13 November 2022 08:25 To: jharno/cubep3m @.> Cc: Joachim Harnois-Deraps @.>; Comment @.> Subject: Re: [jharno/cubep3m] error in run cubep3m (Issue #15)

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

Thanks for reply. Now I tried to rum cubep3m, but its not going well. This is my parameters file. [image]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F96163054%2F201512671-d0bbd3e9-6907-447c-a8b8-05650eddb905.png&data=05%7C01%7CJoachim.Harnois-Deraps%40newcastle.ac.uk%7Ce664023d0eb94a95b9c608dac550a33f%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638039247364988347%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CfRxK5xC%2FeT421yYBS9dNairew03x2uZLiYhzv4H7b8%3D&reserved=0 (Should I pin "ic_path" to the ic file?) Is there anything else I need to do before running the software, maybe I'm missing something? [image]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F96163054%2F201512705-6bcbaa01-331e-4fca-b8f8-f24ebdf057b9.png&data=05%7C01%7CJoachim.Harnois-Deraps%40newcastle.ac.uk%7Ce664023d0eb94a95b9c608dac550a33f%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638039247364988347%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQPrZLiqyBRbxbwL5l0ANI%2F2T9rLWTjs3bMyO5rm8Vg%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjharno%2Fcubep3m%2Fissues%2F15%23issuecomment-1312669574&data=05%7C01%7CJoachim.Harnois-Deraps%40newcastle.ac.uk%7Ce664023d0eb94a95b9c608dac550a33f%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638039247364988347%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5jx%2FYA%2FuSh9A1Hyg5NPvmVbl3LFrQURmmy26gnpIU20%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAISJHI73I5WNFGRKK2Z7LDWICQX5ANCNFSM6AAAAAARZ45NOE&data=05%7C01%7CJoachim.Harnois-Deraps%40newcastle.ac.uk%7Ce664023d0eb94a95b9c608dac550a33f%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638039247364988347%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VmK8nVRbBlh6Ly6v%2Bhd8Ab0GXGjcbSwHGUgWZOs7WcU%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

wuseyu commented 1 year ago

Sorry, I just saw the email, I tried setting it as an absolute path yesterday, it doesn't seem to work.

jharno commented 1 year ago

Can you make sure that when you run ciubep3m, the code looks for the IC in /scratch/wuseyu/xv0.ic, and not /scratch/$USER/ ?

You need to recompile cubep3m for this to take effect.

wuseyu commented 1 year ago

Hello, I tried reimplementing everything I did, now I make sure all files are new, then I recompiled "cubep3m", but still get "FFT direction error" error and file "xv0.ic" not found file error.

This is my parameters file. image

This is log

image

If I modify ic_path to "/scratch/$USER/xv0.ic", the log file will show like this 20221114221016

jharno commented 1 year ago

Right, so you want ic_path = /scratch/wuseyu/

Can you check the size of the xv0.ic? Maybe the file is found but can't be read properly. In particular, you should make sure you do not have the -DBINARY flag in your makefile.

wuseyu commented 1 year ago

It looks like there has been some breakthrough!

image

Can this be considered a successful start of the service? The log will still report "FFT direction error", does this affect the program?

wuseyu commented 1 year ago

Now the dormitory is powered off, any new suggestions I will try after calling tomorrow. Thanks again for your detailed guidance.

wuseyu commented 1 year ago

Sorry for not updating the progress for such a long time. I was busy with my final exam some time ago, and it is almost over now, so I hope to continue with the next work. What is the cause of the current memory error problem? This error It is generated when the executable is run, maybe I should modify some code that generates the executable? Anyway, thank you for your continued support. image

jharno commented 1 year ago

I am not sure. The problem you are trying to solve is very small, it caused issues in the past. Perhaps it would help if you scaled it up a little.

can you print your parameter file?

wuseyu commented 1 year ago

13e13820cefd8eb32a1fa789dc9694d This is the result of my running the program yesterday, don't know if you need more information.

wuseyu commented 1 year ago

According to the printed log, it seems to have read the correct file. I searched for some similar errors on the Internet. It seems to be classified as a memory error, but there is no unified solution. After all, there are too many causes.

jharno commented 1 year ago

Can you print the parameter file please (not the log)

wuseyu commented 1 year ago

Do you mean the "parameters" file? image And under the folder “/scratch/${USER}”, this is the current state。 image

wuseyu commented 1 year ago

This is the log file generated by the current run, I think this may help to solve the problem. a.log

wuseyu commented 1 year ago

May I ask if this line of error caused the operation to fail? image

jharno commented 1 year ago

Sorry, I don't know what that memory line means in terms is problems in the code.

wuseyu commented 1 year ago

Signal 11, or officially know as "segmentation fault", means that the program accessed a memory location that was not assigned. That's usually a bug in the program. This is the relevant explanation I found on the Internet, and I don't know much about this type of error.

wuseyu commented 1 year ago

This may be caused by stack limits or error passing parameters or something else.

jharno commented 1 year ago

Could you increase the "tiles_node_dim" from 2 to 4 in the parameters file, then recompile both the initial conditions and main code, re-run the initial and main codes. I suspect that the problem could be caused by equivalences between arrays, which can sometimes be broken if the problem is too small.

wuseyu commented 1 year ago

image This doesn’t seem to work. This run is much slower than the previous one. I checked the update time of the cubep3m executable file. It has indeed been updated. The data used for analysis is also different, but the program still reports the same mistake.

jharno commented 1 year ago

Can you run it through a debugger and see what line is causing the crash?


From: wuseyu @.> Sent: 08 December 2022 16:11 To: jharno/cubep3m @.> Cc: Joachim Harnois-Deraps @.>; Comment @.> Subject: Re: [jharno/cubep3m] error in run cubep3m (Issue #15)

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

[image]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F96163054%2F206497873-ce53d29c-a043-47e6-92d8-daaf96d6aa3d.png&data=05%7C01%7CJoachim.Harnois-Deraps%40newcastle.ac.uk%7C4c984eee24ae4f0e69f108dad936e170%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638061126971551599%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Tk45Y0DQFeCPRU5bUo1WmudgxMuyt4jsRedfaEn0v8I%3D&reserved=0 This doesn’t seem to work. This run is much slower than the previous one. I checked the update time of the cubep3m executable file. It has indeed been updated. The data used for analysis is also different, but the program still reports the same mistake.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjharno%2Fcubep3m%2Fissues%2F15%23issuecomment-1342960699&data=05%7C01%7CJoachim.Harnois-Deraps%40newcastle.ac.uk%7C4c984eee24ae4f0e69f108dad936e170%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638061126971551599%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VobyeDM4LmIAgG86D7Uq4DWtgX2lkFXVaDSidXI0X2Q%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAISJHOPNKVJVC6SANVJ3ALWMICDNANCNFSM6AAAAAARZ45NOE&data=05%7C01%7CJoachim.Harnois-Deraps%40newcastle.ac.uk%7C4c984eee24ae4f0e69f108dad936e170%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638061126971551599%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dsuyuz2TavI%2BtY%2F6UDaNFbrYBT8uX0Y0KUx0BxOfDBg%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

wuseyu commented 1 year ago

Sorry, there are more things in school than I thought, but now all have been resolved, I tried to add -g, -Og, -fcheck=all and -fbacktrace in the fflags of the Make_PP_THREADS file that generates the executable file, and then run cubep3m , however this debug flag does not work.Should I make these changes to this file, or to some other file? image image image

image