Closed GoogleCodeExporter closed 9 years ago
How do we know that the data comes broken from the server ?
This is fairly easy to fix the data (estimate the timestamp) or at least warn
about
the broken data.
One of the approaches to fix the data is to put the broken records in the
separate
queue. When first correct record arrives fix the timestamps on the records in
the
queue and forward the records in the correct order.
Another approach is to drop the record with warning and separate error counter.
May
the problem is a broken data base and these trades do not exist or from a
previous day.
Yet another approach is to fix the date in real time by using the last correct
time
stamp + time period which from the last record.
Original comment by larytet@gmail.com
on 13 Oct 2009 at 9:28
Attachments:
This specific field itself is not very interesting. It's probably better to
correct
it - the issue is that the first character in the string is either dropped or
replaced by '0' (further analysis will probably reveal other error patterns).
At this
stage I wouldn't drop any data (and also I continue to save the logs in the
original
format). For now, all the data cleaning will be done manually, using data
analysis
s/w (Matlab, SAS, R - whatever). Next, when the problems with the data are
clear,
build a data-cleaning algo and add it to the project.
Original comment by jerusale...@gmail.com
on 13 Oct 2009 at 9:40
Field UPD_TIME is always equal or larger than the previous value in the
attached logs
Original comment by larytet@gmail.com
on 13 Oct 2009 at 9:46
I would add a simple RezefDataValidator (child of RxDataValidator) which checks
all
fields for reasonable number of characters and for illegal characters.
RxDataValidator (base class) will implement method Result() which will return
"broken" if the record can not be fixed.
Record validation is a state full process which can be aware of the previous
history.
For example, is the record a very first record in the stream.
RezefDataValidator will support counters of errors. RezefDataValidator will fix
the
field when it is possible. RezefDataValidator.Result() will return the status
of the
latest validation.
Original comment by larytet@gmail.com
on 13 Oct 2009 at 9:57
Field UPD_TIME is supposed to be equal or larger than the previous value - it
is the
time part of the timestamp attached to the record by the remote server. As k300
events arrive in synchronous manner, no wonder that they are written to the log
in
the order they arrive, right?
Original comment by jerusale...@gmail.com
on 13 Oct 2009 at 10:12
We can use UPD_TIME to fix the time stamps
Original comment by larytet@gmail.com
on 13 Oct 2009 at 11:50
I close this issue since it's clear now that upd_time is created by the TaskBar
using
local machine time and system timer, which is low-res. Attaching our custom time
stamp using high-resolution timer solves the problem. LST_DL_TM is fixed now,
since
the system was configured to communicate with dedicated K300 / Orders servers
instead
of AS/400.`
Original comment by jerusale...@gmail.com
on 10 Nov 2009 at 11:05
i just thought, that the actual protocol (UDP packet) running between server and
client can contain time stamp and TaskBar simply ignores it.
Original comment by larytet@gmail.com
on 10 Nov 2009 at 12:08
Original issue reported on code.google.com by
larytet@gmail.com
on 13 Oct 2009 at 9:24