Closed dentalfloss1 closed 3 months ago
Very interesting that you report this and thanks for the logs.
Is this from the master branch?
Does the problem persist when running from the Fix_SAWarning_copying_to_the_same_column_by_two_relationships
branch?
This is from the r6.0c tag, which I believe is essentially the same as the master branch right? I am trying this branch now.
From: Hanno Spreeuw @.> Sent: Thursday, June 22, 2023 5:27 AM To: transientskp/tkp @.> Cc: Sarah Chastain @.>; Author @.> Subject: Re: [transientskp/tkp] ForeignKeyViolation when running multiple frequencies (Issue #615)
[EXTERNAL]
Very interesting that you report this and thanks for the logs.
Is this from the master branch?
Does the problem persist when running from the Fix_SAWarning_copying_to_the_same_column_by_two_relationships branch?
— Reply to this email directly, view it on GitHubhttps://github.com/transientskp/tkp/issues/615#issuecomment-1602474218, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF35E64DUUV4Z5EIZUSMLTTXMQT25ANCNFSM6AAAAAAZGO2JN4. You are receiving this because you authored the thread.Message ID: @.***>
It appears to produce the same error. trapdebuglog.txt traplog.txt
Thanks for checking this.
Trying to find out if this has anything to do with the Python 2 to 3 conversion of TraP and/or the simultaneous upgrade of SQLAlchemy.....
Did you perhaps also run a Python 2 version of TraP on this or a similar dataset? If yes, was this successful?
If no, I might have to get my hands on a multi-frequency dataset myself.
edit: It would be easier for me to send you the data. Just let me know what would be the easiest way to do that
Could you perhaps send me a link to a Dropbox folder?
From: Hanno Spreeuw @.> Sent: Monday, June 26, 2023 1:58 PM To: transientskp/tkp @.> Cc: Sarah Chastain @.>; Author @.> Subject: Re: [transientskp/tkp] ForeignKeyViolation when running multiple frequencies (Issue #615)
[EXTERNAL]
Could you perhaps send me a link to a Dropbox folder?
— Reply to this email directly, view it on GitHubhttps://github.com/transientskp/tkp/issues/615#issuecomment-1608146068, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF35E66MI4XXN7QCQHX655LXNHSVZANCNFSM6AAAAAAZGO2JN4. You are receiving this because you authored the thread.Message ID: @.***>
Thanks. I was able to download these 2178 FITS files.
It completed with lots of warnings of this type:
`
../../tkp/sourcefinder/stats.py:15: RuntimeWarning: divide by zero encountered in divide
help1 = clip_limit/(sigma*numpy.sqrt(2))
../../tkp/sourcefinder/stats.py:17: RuntimeWarning: invalid value encountered in multiply
return sigma2(help2-2numpy.sqrt(2)help1numpy.exp(-help12))-clipped_std*2help2
../../lib/python3.10/site-packages/scipy/optimize/_minpack_py.py:175: RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
`
So problems with kappa, sigma clipping. Can be caused by images that are not well calibrated. Like this one: oims_25.100MHz_2023-05-30T08-23-05_stokesI.fits
.
Perhaps you could also provide me with your pipeline.cfg
, job_params.cfg
and inject.cfg
files, since I was not yet able to reproduce your error.
injectcfg.txt job_paramscfg.txt pipeline_cfg.txt
Yea, some of them, perhaps even a large number are garbage. I was hoping to use the quality control to get rid of those automatically.
I can now produce a similar error. Or the opposite error DETAIL: Key (runcat)=(96) already exists actually:
scipy/optimize/_minpack_py.py:175: RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
WARNING:root:position errors extend outside image
WARNING:root:position errors extend outside image
WARNING:root:position errors extend outside image
09:44:24 INFO tkp.main: performed 136 forced fits in 6 images
09:44:24 INFO tkp.main: calculating variability metrics
09:44:24 ERROR tkp.main: timestep raised <class 'sqlalchemy.exc.IntegrityError'> exception: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "ix_varmetric_runcat"
DETAIL: Key (runcat)=(96) already exists.
[SQL: INSERT INTO varmetric (runcat, v_int, eta_int, band, newsource, sigma_rms_max, sigma_rms_min, lightcurve_max, lightcurve_avg, lightcurve_median) SELECT runcat, v_int, eta_int, band, newsource, sigma_rms_max, sigma_rms_min, lightcurve_max, lightcurve_avg, lightcurve_median
FROM (SELECT r.id AS runcat, r.wm_ra AS ra, r.wm_decl AS decl, r.wm_uncertainty_ew AS wm_uncertainty_ew, r.wm_uncertainty_ns AS wm_uncertainty_ns, r.xtrsrc AS xtrsrc, r.dataset AS dataset_id, r.datapoints AS datapoints, match_assoc.v_int AS v_int, match_assoc.eta_int AS eta_int, match_img.band AS band, newsrc_trigger.id AS newsource, newsrc_trigger.sigma_rms_max AS sigma_rms_max, newsrc_trigger.sigma_rms_min AS sigma_rms_min, max(agg_ex.f_int) AS lightcurve_max, avg(agg_ex.f_int) AS lightcurve_avg, median(agg_ex.f_int) AS lightcurve_median
FROM (SELECT a_lt.runcat AS runcat_id, max(e_lt.f_int) AS max_flux
FROM (SELECT a_laids.id AS assoc_id, last_assoc_timestamps.max_time AS max_time, last_assoc_timestamps.band AS band, last_assoc_timestamps.runcat AS runcat
FROM (SELECT r_timestamps.id AS runcat, max(i_timestamps.taustart_ts) AS max_time, i_timestamps.band AS band
FROM runningcatalog AS r_timestamps JOIN assocxtrsource AS a_timestamps ON r_timestamps.id = a_timestamps.runcat JOIN extractedsource AS e_timestamps ON a_timestamps.xtrsrc = e_timestamps.id JOIN image AS i_timestamps ON i_timestamps.id = e_timestamps.image
WHERE %(param_1)s = i_timestamps.dataset GROUP BY r_timestamps.id, i_timestamps.band) AS last_assoc_timestamps JOIN assocxtrsource AS a_laids ON a_laids.runcat = last_assoc_timestamps.runcat JOIN extractedsource AS e_laids ON a_laids.xtrsrc = e_laids.id JOIN image AS i_laids ON i_laids.id = e_laids.image AND i_laids.taustart_ts = last_assoc_timestamps.max_time) AS last_assoc_per_band JOIN assocxtrsource AS a_lt ON a_lt.id = last_assoc_per_band.assoc_id JOIN extractedsource AS e_lt ON a_lt.xtrsrc = e_lt.id GROUP BY a_lt.runcat) AS last_ts_fmax JOIN assocxtrsource AS match_assoc ON match_assoc.runcat = last_ts_fmax.runcat_id JOIN extractedsource AS match_ex ON match_assoc.xtrsrc = match_ex.id AND match_ex.f_int = last_ts_fmax.max_flux JOIN runningcatalog AS r ON r.id = last_ts_fmax.runcat_id JOIN image AS match_img ON match_ex.image = match_img.id LEFT OUTER JOIN (SELECT n_ntr.id AS id, n_ntr.runcat AS rc_id, e_ntr.f_int / i_ntr.rms_min AS sigma_rms_min, e_ntr.f_int / i_ntr.rms_max AS sigma_rms_max
FROM newsource AS n_ntr JOIN extractedsource AS e_ntr ON e_ntr.id = n_ntr.trigger_xtrsrc JOIN image AS i_ntr ON i_ntr.id = n_ntr.previous_limits_image
WHERE %(param_2)s = i_ntr.dataset) AS newsrc_trigger ON newsrc_trigger.rc_id = r.id JOIN assocxtrsource AS agg_assoc ON r.id = agg_assoc.runcat JOIN extractedsource AS agg_ex ON agg_assoc.xtrsrc = agg_ex.id JOIN image AS agg_img ON agg_ex.image = agg_img.id AND agg_img.band = match_img.band
WHERE %(param_3)s = r.dataset GROUP BY r.id, r.wm_ra, r.wm_decl, r.wm_uncertainty_ew, r.wm_uncertainty_ns, r.xtrsrc, r.dataset, r.datapoints, match_assoc.v_int, match_assoc.eta_int, match_img.band, newsrc_trigger.id, newsrc_trigger.sigma_rms_max, newsrc_trigger.sigma_rms_min) AS anon_1]
[parameters: {'param_1': 1, 'param_2': 1, 'param_3': 1}]
(Background on this error at: http://sqlalche.me/e/14/gkpj)
This happens directly after problems with kappa, sigma clipping, so that may be related.
method = "serial"
instead of multiproc
does not seem to make a difference.
I will try again with the latest Python 2 release of TraP.
Same error
14:59:26 ERROR tkp.main: timestep raised <class 'sqlalchemy.exc.IntegrityError'> exception: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "ix_varmetric_runcat"
DETAIL: Key (runcat)=(96) already exists.
using TraP r5.0 with Python 2.
Good to know that this does not come from the Python 2 --> 3 conversion with its simultaneous SQLAlchemy upgrade.
this is the first time I have seen this error appear so it seems to be something with having multiple images on the same timestep.
Before spending more time on this, I want to check with @AntoniaR and @jdswinbank if TraP should provide for this.
Hi both, just had chance to catch up on this issue.
I think this is an issue that I have seen before. It is basically saying that the source it is trying to update has been removed from the database. My guess is that the following is happening:
This is the section of code I am refering to: https://github.com/transientskp/tkp/blob/7a1a175b7205aac13d559de0afd1a02637f83a05/tkp/main.py#L362-L377
The behaviour we wanted was that TraP did not include forced fit measurements for newly detected sources in that specific timestep. This was to prevent issues like this.
Does this help track down the error?
@dentalfloss1 in the mean time, this error could be being triggered by your current settings. I would strongly recommend doing some image plane quality control before running TraP - then you can either give it max and min rms values or a list of good images to process - see this code for an example https://github.com/transientskp/TraP_tools/tree/master/PreTraPimageQC/QC.
Then I would also look at the source finding and association settings as I expect these to be somewhat different for LWA data. From the job_params.txt you sent earlier, I would change:
ew_sys_err = 10
; Systematic errors on ra & decl (units in arcsec)
ns_sys_err = 10
This is likely much higher for LWA images as they are much lower resolution than MeerKAT. I'd recommend testing what the typical position offset is outside of TraP and then use that value. I would expect it to be of order arcmin rather than arcsec.
Also
beamwidths_limit = 1.0
I tend to set this to 3.0 as this gets more reliable source associations - especially if sources are shifting slightly due to ionospheric effects (which are stronger at LWA frequencies).
You may also want to investigate the back_size values as these seem quite small - but it does depend on your images.
I would also strongly recommend using a higher detection threshold than 5 sigma until you are confident with your other settings.
I did a little digging on this and it turns out, it's quite interesting. There is a runningcatalog source that gets returned in one of the images in the timestep by tkp.db.nulldetections.get_nulldetections(). This source doesn't show up in any of the blind extractions from tkp.main.store_extractions(). And furthermore, when you issue the same sql command after trap quits, the source no longer shows up.
@dentalfloss1 Sorry for my late response. This sounds like a bug in tkp.db.nulldetections.get_nulldetections
. Would you draw the same conclusion?
Maybe? I don't really recall at this point. I remember wondering if the problem had something to do with SQLAlchemy trying to run SQL commands out of order or at the same time. But I never had a satisfying conclusion to it. For my case I ended up turning off the forced fits and only did blind extractions, which is not really a solution
We should check if this is still an issue in the R7 TraP
Let me know if you need any info from me or want me to try anything.
@dentalfloss1 do you have a small dataset that can reproduce the error? Then we can test it in the new TraP. Thanks!
Annoyingly, I can't seem to replicate this issue anymore. I guess you can close it?
I've even tried using singularity images of TraP that I previously used.
Ok, let's close this issue for now but do please reopen or file a new issue if you see it again. Thanks!
I was running LWA data through trap with multiple frequencies and at 269/363 images I ran into the following error:
Since using the latest TraP version, 6 (is that right?), this is the first time I have seen this error appear so it seems to be something with having multiple images on the same timestep. Attached are the logs.
trap_debug_log.txt traplog.txt