"Spatial Position is Not a Number" Error when running TraP

ycendes commented 9 years ago

Running through Folkert's latest batch of images, ie at /scratch/fhuizing/aartfaac/results/shower/s256/, I was getting the following:

RuntimeError: Spatial position is not a number

When running in serial mode, here is the full error including the image where it fails (which, upon inspection, is an image that is plagued with RFI)-

INFO:tkp.steps.source_extraction:Extracting image: /scratch/fhuizing/aartfaac/results/shower/s256/F5e+07_S0-62_T13-08-2015_22-28-27.image
WARNING:tkp.sourcefinder.extract:Physical coordinates failed at 648.340581, -256.384013
WARNING:tkp.sourcefinder.image:Island not processed; unphysical?
Traceback (most recent call last):
  File "/home/ycendes/aartfaacenv/bin/trap-manage.py", line 10, in <module>
    execfile(__file__)
  File "/home/ycendes/tkp/tkp/bin/trap-manage.py", line 10, in <module>
    tkp.management.main()
  File "/home/ycendes/tkp/tkp/management.py", line 323, in main
    args.func(args)
  File "/home/ycendes/tkp/tkp/management.py", line 223, in run_job
    run(args.name, monitor_coords)
  File "/home/ycendes/tkp/tkp/main.py", line 138, in run
    extraction_results = runner.map("extract_sources", urls, arguments)
  File "/home/ycendes/tkp/tkp/distribute/__init__.py", line 42, in map
    return self.module.map(func, iterable, args)
  File "/home/ycendes/tkp/tkp/distribute/serial/__init__.py", line 3, in map
    x = [func(i, *arguments) for i in iterable]
  File "/home/ycendes/tkp/tkp/distribute/serial/tasks.py", line 22, in extract_sources
    return tkp.steps.source_extraction.extract_sources(url, extraction_params)
  File "/home/ycendes/tkp/tkp/steps/source_extraction.py", line 50, in extract_sources
    force_beam=extraction_params['force_beam']
  File "/home/ycendes/tkp/tkp/sourcefinder/image.py", line 401, in extract
    labelled_data=labelled_data, labels=labels
  File "/home/ycendes/tkp/tkp/sourcefinder/image.py", line 863, in _pyse
    det = extract.Detection(measurement, self, chunk=island.chunk)
  File "/home/ycendes/tkp/tkp/sourcefinder/extract.py", line 754, in __init__
    self._physical_coordinates()
  File "/home/ycendes/tkp/tkp/sourcefinder/extract.py", line 820, in _physical_coordinates
    [self.x.value, self.y.value])]
  File "/home/ycendes/tkp/tkp/utility/coordinates.py", line 668, in p2s
    raise RuntimeError("Spatial position is not a number")
RuntimeError: Spatial position is not a number

I am currently working in ycendes@struis:/scratch/ycendes/aartfaac/cendesrfitest

mkuiack commented 9 years ago

You can set the detection threshold to >= 18. Lower than this and these two are confused resulting in nonsense values for that gauss fit.

Not sure why the fit gives such nonsense values, negative pixel location, on a bad fit though.

traperror465_island

AntoniaR commented 9 years ago

What does your figure show? Is this the source that it is failing on? Also, a couple of source-finder questions - is this for a completely free fit or one of the force-fit to a Gaussian? Also, have you tried changing the deblending settings and the RMS grid size?

Ideally, if the image is plagued by RFI then I think we should reject the image automatically. However, it is concerning that TraP crashes so badly on this image - it should simply be skipping any sources that it fails to fit or gives a NAN result.

mkuiack commented 9 years ago

Yes these are the sources which cause it to fail. The source extractor is seeing this as one source and trying to fit a gaussian to it. I've attached the mask image of all the sources. This one is the light blue near the top-right. note the red double, looking one near the bottom left is actually two.

The result with detection threshold 18 can be seen here http://banana.transientskp.org/master/vlo_marknewdb_2/image/5497/

traperror465

mkuiack commented 9 years ago

Yes it is a free fit, that is no parameters are fixed when fitting. No I haven't changed any of the de-blending settings.

AntoniaR commented 9 years ago

That might be worth trying, however I might be tempted to say that perhaps we should find a way to reject these images, if this is simply caused by bad quality data, instead of putting a lot of effort into solving the source finder problem.

gijzelaerr commented 9 years ago

I think I agree with @AntoniaR here, we should first try just skipping these images since running source extractor here would ruin your lightcurve anway. @ycendes knows how to do this, but didn't get to it yet.

gijzelaerr commented 9 years ago

Still the sourcefinder shouldn't crash. @mkuiack tried to fix this by just continuing on the described exception, but then the sourcefinder logic starts to return invalid data which will crash TraP later on.

mkuiack commented 9 years ago

Yvette asked me to rerun /scratch/fhuizing/aartfaac/results/shower/s256/ images with the latest master and the same job_params to see if I would reproduce her crash. So I made a new project and a new job, then copied the job_params.cfg and images_to_process.py files, then cloned /tkp from github did and ran TraP, the results are in http://banana.transientskp.org/master/vlo_mark_db_2/dataset/1/ It crashed on image 174 with the same error as above.

I then when back to my other directory and ran it, with the same job_params.cfg and images_to_process.py, but with the tkp files that I'd previously been messing around with and it ran all the way though again. The results are in http://banana.transientskp.org/master/vlo_mark_db_1/dataset/11/

In the run that worked the dataset is added to a database which contains Peeyush's images. And I notice that the image id's reflect that, so they're ~6100 - ~6500 rather than 0-369. Not sure what else would be affected by mixing datasets in a database? Maybe sky coordinates? It still doesn't make sense if the problem is a bad Gaussian fit to a blended source in a high noise image though, so I don't know...

mkuiack commented 9 years ago

To answer my own question, yes this does result in a shift in the measured coordinates for a source. For example take the same image from both runs: http://banana.transientskp.org/master/vlo_mark_db_2/image/161/ and http://banana.transientskp.org/master/vlo_mark_db_1/image/6306/ then look at its Running Catalogue entry and you can see difference in the position and Light curve. Whether these are within know error bars though I don't know. Dec +- 4'? that seems high!

gijzelaerr commented 8 years ago

@mkuiack what runningcatalog are you referring to? When I look at the two images everything looks the same.

gijzelaerr commented 8 years ago

Ok I get your point. I think the difference in positions can be explained by the unfinished run. The unfinished run got to about 50%. Since we are blind fitting on a lot of noise I think there is a lot of 'drifting' happening.

gijzelaerr commented 8 years ago

so this is actually a duplicate of this issue reported by @hsuyeep right?

https://github.com/transientskp/tkp/issues/453

gijzelaerr commented 8 years ago

So yes, this is a duplicate. I'm closing this issue, please reopen if you think otherwise.

transientskp / tkp

"Spatial Position is Not a Number" Error when running TraP #465