kirxkirx / vast

Variability Search Toolkit (VaST)
http://scan.sai.msu.ru/vast/
GNU General Public License v3.0
13 stars 3 forks source link

passing fits files via command line has upper limit #6

Closed mrosseel closed 4 years ago

mrosseel commented 4 years ago

For some star fields I have 20k+ fits files (many years of observations). When passing these as *.fits into vast, linux is complaining about the size of the glob expansion. It seems there are no easy solutions for this.

Would it be possible to pass the list of fits files in e.g. a .txt file to circumvent this limitation?

thx Mike

kirxkirx commented 4 years ago

Hi Mike,

thanks for the report - very informative and useful, as usual!

The first thing to try is to pass VaST the directory containing images, rather than the images themselves. Individual images and directories can be mixed on the command line. VaST will try to go two levels down the directory tree searching for images (i.e. you may specify the path to a directory containing sub-directories with images).

Also, there is a way to specify the input images in a text file instead of the command line. The text file should be named vast_list_of_input_images_with_time_corrections.txt and placed in the VaST working directory.

The file should contain two columns, the first one being the path to the input images (one image per line), the second column is either a clock correction in seconds or the Julian Date of the middle of exposure (for that particular image). VaST tries to guess if the second column lists a JD or a clock offset in seconds based on the numerical value. You may find an example in vast_list_of_input_images_with_time_corrections.txt_example.

The input file may be generated with a command like ls ../sample_data/f_72-0*.fit | awk '{print $1" 0.0"}' > vast_list_of_input_images_with_time_corrections.txt After creating the image list file you may run ./vast without specifying any images on the command line. The image path should contain no white spaces (sorry), if it does - create a symlink.

Don't forget to remove an old vast_list_of_input_images_with_time_corrections.txt before processing a new image set, otherwise the images specified in the file and on the command line can mix up.

It is silly that VaST expects a clock offset to be specified for each image, even if the offset is zero. I'll change it to allow specifying just the image path.

Note that there is also a hard limit on the number of images in src/vast_limits.h: #define MAX_NUMBER_OF_OBSERVATIONS 120000 // per star You may change it and recompile with make. VaST uses such hard limits in a few places through the code because it is often faster to ask for a big chunk of memory (and then not use most of it) than to ask for more memory with realloc() each time the program needs to store a new item. This trick relies on the lazy memory allocation in Linux.

mrosseel commented 4 years ago

thx for the clear answer, your first suggestion (passing the dir) worked perfectly. Some remarks:

You can close this issue, my problem was resolved!

kirxkirx commented 4 years ago

Yes, it is possible to pass as command line arguments first the reference image and then the directory congaing images. If the reference image is in the same directory as the other images - it should still be OK, VaST will figure out that the "new" image matches the reference image perfectly (if it's the same image) and will avoid duplicate output.

I'll keep the Issue open for a bit longer as a reminder for myself...

mrosseel commented 4 years ago

since this issue is resolved I'll close this and make a new issue for the proposed performance enhancements