larytet / larytet-master

Automatically exported from code.google.com/p/larytet-master
0 stars 1 forks source link

Broken date/time field #9

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

There is an issue with the data - the field
LST_DL_TM sometimes comes broken. This field is the time (minute-resolution)
when the last (latest) transaction took place. For instance, it may be
"00:21" instead of "10:21" OR " 6:12" instead of "16:12". I don't that it's
because of I/O processing by the application, but it comes already broken
from the FMR's server. Please take a look at the real data log samples,
attached (you can see the broken data in Rezef file).

Original issue reported on code.google.com by larytet@gmail.com on 13 Oct 2009 at 9:24

GoogleCodeExporter commented 9 years ago
How do we know that the data comes broken from the server ?
This is fairly easy to fix the data (estimate the timestamp) or at least warn 
about
the broken data. 
One of the approaches to fix the data is to put the broken records in the 
separate
queue. When first correct record arrives fix the timestamps on the records in 
the
queue and forward the records in the correct order.

Another approach is to drop the record with warning and separate error counter. 
May
the problem is a broken data base and these trades do not exist or from a 
previous day.

Yet another approach is to fix the date in real time by using the last correct 
time
stamp + time period which from the last record.

Original comment by larytet@gmail.com on 13 Oct 2009 at 9:28

Attachments:

GoogleCodeExporter commented 9 years ago
This specific field itself is not very interesting. It's probably better to 
correct
it - the issue is that the first character in the string is either dropped or
replaced by '0' (further analysis will probably reveal other error patterns). 
At this
stage I wouldn't drop any data (and also I continue to save the logs in the 
original
format). For now, all the data cleaning will be done manually, using data 
analysis
s/w (Matlab, SAS, R - whatever). Next, when the problems with the data are 
clear,
build a data-cleaning algo and add it to the project.

Original comment by jerusale...@gmail.com on 13 Oct 2009 at 9:40

GoogleCodeExporter commented 9 years ago
Field UPD_TIME is always equal or larger than the previous value in the 
attached logs

Original comment by larytet@gmail.com on 13 Oct 2009 at 9:46

GoogleCodeExporter commented 9 years ago
I would add a simple RezefDataValidator (child of RxDataValidator) which checks 
all
fields for reasonable number of characters and for illegal characters.

RxDataValidator (base class) will implement method Result() which will return
"broken" if the record can not be fixed. 

Record validation is a state full process which can be aware of the previous 
history.
For example, is the record a very first record in the stream. 

RezefDataValidator will support counters of errors. RezefDataValidator will fix 
the
field when it is possible. RezefDataValidator.Result() will return the status 
of the
latest validation.

Original comment by larytet@gmail.com on 13 Oct 2009 at 9:57

GoogleCodeExporter commented 9 years ago
Field UPD_TIME is supposed to be equal or larger than the previous value - it 
is the
time part of the timestamp attached to the record by the remote server. As k300
events arrive in synchronous manner, no wonder that they are written to the log 
in
the order they arrive, right?

Original comment by jerusale...@gmail.com on 13 Oct 2009 at 10:12

GoogleCodeExporter commented 9 years ago
We can use UPD_TIME to fix the time stamps

Original comment by larytet@gmail.com on 13 Oct 2009 at 11:50

GoogleCodeExporter commented 9 years ago
I close this issue since it's clear now that upd_time is created by the TaskBar 
using
local machine time and system timer, which is low-res. Attaching our custom time
stamp using high-resolution timer solves the problem. LST_DL_TM is fixed now, 
since
the system was configured to communicate with dedicated K300 / Orders servers 
instead
of AS/400.`

Original comment by jerusale...@gmail.com on 10 Nov 2009 at 11:05

GoogleCodeExporter commented 9 years ago
i just thought, that the actual protocol (UDP packet) running between server and
client can contain time stamp and TaskBar simply ignores it.

Original comment by larytet@gmail.com on 10 Nov 2009 at 12:08