Missing entry in TOC file for first video stimulus in playlist mode.

davidt0x commented 2 years ago

This bug occurs when using video stimulus playlist entries without any timing information specified. The TOC file output after the experiment is usually missing the first video stimulus played during the experiment.

davidt0x commented 2 years ago

Ok, I think I have figured out why this TOC issue is happening with video playlists with no timing info. It happens because when the main video server thread runs it creates the VideoStimPlaylist object:

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/video/video_server.py#L1188-L1190

whose constructor calls self.play_item immediately if not paused:

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/video/video_server.py#L91-L92

This play_item method in turn updates the state of the playlist (advances the internal index) and then if and only if the fly_vr shared state variables are set it signals to append to the TOC file.:

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/video/video_server.py#L154-L159

The problem is that the constructor of the VideoStimPlaylist sets this flyvr_shared_state variable to None by default. This gets fixed later when the next item in the playlist is invoked by the main server loop after picking up the playlist on the message queue:

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/video/video_server.py#L1009

It seems to me that the first thing todo would be remove the reference of the flyvr shared state from the VideoPlaylist entirely and put this signaling responsibility onto the server main loop. This seems like a separation of concerns issue. The code for managing the playlist sequencing shouldn't be in charge of signaling anything to the state of the rest of the program. This is how it is in the audio playlist case I believe. There could be something I am missing for why this is tricky in the video case though. Alternatively, we could just pass the flyvr_share_state through to the VideoPlaylistStim constructor. Though, I can't see any real reason this playlist code needs to know about the state of the whole system.

I will try to implement this fix locally and test on the setup.

mwangq commented 2 years ago

hey, I ran another quick test this morning using this config file: config_avd_producer_test.txt

I wanted to test two things:

1) whether the toc file looks the same as the time/do instructions from the config file (close but not quite... there seem to be some extra entries in daq, some lost entries in audio) 20220210_1122.toc.txt

2) whether the producer_playlist_n is being incremented for daq stimuli (it is not. I wanted to test this because yday I noticed the producer_playlist_n is not incremented for audio stimuli either) (also reflected in the sync info in daq.h5 and sound_server.h5)

Originally I wanted to use the toc for two purposes: to get the numerical ID (e.g. 1) for a given playlist item (e.g. v_gratingL) (as described in the flyVR sync doc ) and then also to use the toc to help me figure out which items were played when (because this is not hardcoded if I am randomly selecting and playing items, using an experiment.py).

I can almost figure out which items were played when, from a combo of producer_playlist_n and producer_instance_n in the sync data (assuming each instance_n made its way onto the toc file so I could figure out that producer_instance_n of 1 = v_gratingL). And producer_playlist_n can be roughly back-calculated by looking at the blocks of producer_instance_n. That is true if the stimulus is switching every time. It is not true, for instance, in this config file, where I have some stimuli (e.g. audio) play the same thing while I change another backend (e.g. daq).

Not sure if you prefer to open a separate issue for producer_playlist_n increment. I include the comment here because it is related to users being able to pull out when stimuli were played, which I thought the toc was also supposed to be for. and the "-1" values for producer_playlist_n show up in the daq/audio lines of the toc. I glanced at video_server.py and I see the producer_playlist_n increment there; I could probably implement it in sound_server.py but don't want to potentially break anything... (edit: I made some quick changes to sound_server.py and I think I have the producer_playlist_n incrementing now. nothing has broken yet...)

nzjrs commented 2 years ago

a small related fix is that for clarity only one experiment _start should be written to the TOC

https://github.com/murthylab/fly-vr/blob/bf4718a700b29a7afe4b3d0ff44606c343eb20e7/flyvr/hwio/phidget.py#L165

nzjrs commented 2 years ago

hmm, for backends that output chunks (daq,audio) the produce_playlist_n should be incremented in the chunker/data_generator. A test is here that seems to be testing this

https://github.com/murthylab/fly-vr/blob/bf4718a700b29a7afe4b3d0ff44606c343eb20e7/tests/audio/test_samplechunks.py#L218

nzjrs commented 2 years ago

(alternative hypothesis is that missing TOC lines are from lost IPC messages - best to increase logging and check each send per backend gets received by the TOC writer)

davidt0x commented 2 years ago

It seems to me the missing entries in TOC are only ones where the stimulus is repeating? Am I missing something? If that is the case, it is probably this line of code that is the culprit:

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/audio/sound_server.py#L398-L413

nzjrs commented 2 years ago

It seems to me the missing entries in TOC are only ones where the stimulus is repeating? Am I missing something? If that is the case, it is probably this line of code that is the culprit:

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/audio/sound_server.py#L398-L413

The simple sequential case seems to be well enough tested here https://github.com/murthylab/fly-vr/blob/master/tests/audio/test_samplechunks.py

but I think this is the right track. I believe I know what is going on.

the time based experiment plays items, which hits this

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/audio/sound_server.py#L203

which creates a new data_generator

but I think this check

https://github.com/murthylab/fly-vr/blob/5dac72aee429a5207825986cf7589621a29e07f9/flyvr/audio/signal_producer.py#L64

does not catch what is a new chunk from a new data generator. I would suggest to confirm that producer_instance_n is incremented correctly resulting from the and also check producer_instance_n equality in chunk_producers_differ. however, I'm AFK from my windows dev PC - so it could be in fact that this was not the design of producer_instance_n and so chunk_producers_differ should also check the hash() of the creating data generator to handle this case

(I can test next week FWIW)

nzjrs commented 2 years ago

(I can test next week FWIW)

for the above, a hello world test would be to make a test that essentially calls play_item on a playlist for a non sequential sequence of items in the playlist, and checks the subsequent first chunks from the returned generatorS are detected as chunk_producers_differ() == True

mwangq commented 2 years ago

hmm, for backends that output chunks (daq,audio) the produce_playlist_n should be incremented in the chunker/data_generator.

strange news. I found some .toc from Dec and Jan where the (audio) chunk_producer_playlist_n incremented as expected. all of these experiments were audio backend only, using a variety of stimuli (custom matlab made stimuli and "sin" item types). I wonder if the .toc issue I showed above (where the chunk_producer_playlist_n are all -1, for both daq and audio) might have something to do with multiple backends...

nzjrs commented 2 years ago

hmm, for backends that output chunks (daq,audio) the produce_playlist_n should be incremented in the chunker/data_generator.

strange news. I found some .toc from Dec and Jan where the (audio) chunk_producer_playlist_n incremented as expected. all of these experiments were audio backend only, using a variety of stimuli (custom matlab made stimuli and "sin" item types). I wonder if the .toc issue I showed above (where the chunk_producer_playlist_n are all -1, for both daq and audio) might have something to do with multiple backends...

I would expect this to have worked if the experiment was not dynamic - i.e. didn't use the experiment time stanza. If it was simply a predefined sequential playlist that played in order, then all the chunk_producer_playlist_n and friends will all be incremented correctly

that is, as I said https://github.com/murthylab/fly-vr/issues/16#issuecomment-1035515506 it's probbably a function of tearing down and recreating the data_generator which confuses the chunker change detection

nzjrs commented 2 years ago

I believe all issues here should be fixed by these 2 commits

I could only test this on Linux so the commits should be cherry picked appropriately and the branch not merged wholesale.

mwangq commented 2 years ago

hi @nzjrs, yep, the experiments I mentioned in my last comment (where producer_playlist_n incremented as expected) were all experiments that played off a playlist, not an experiment time stanza or an experiment.py.

I pulled the 3 files you edited: flyvr/audio/signal_producer.py flyvr/audio/stimuli.py tests/audio/test_samplechunks.py

and re-ran the same config file as yesterday, which has audio, video, daq backends, for which the playlist_instance_n should be audio: [2, 2, 0, 1, 2], video: [2, 2, 0, 0, 1], daq: [1, 0, 0, 2, 0]. I was hoping that the toc file would match exactly the instructed time/do stanza (5 entries each from audio, video, daq = 15 toc entries), and the producer_playlist_n for each backend would increment from 0 to 4.

for the test yesterday 20220210_1122.toc.txt: the chunk_producer_instance_n in toc was correct, but the chunk_producer_playlist_n were all -1 for audio and daq. video increments fine. there was a missing audio toc entry, and an additional daq toc entry.

for the test today, 20220211_1244.toc.txt: video is fine again, but the chunk_producer_instance_n has some strange -106, -107 ... values, and the chunk_producer_playlist_n now looks like chunk_producer_instance_n

maybe instance and playlist variables were accidentally swapped somewhere
maybe there were additional files I should have pulled... ?
there are still 2 additional daq entries on the toc before other things (the other backends?) come online (can look at the fictrac_frame_num to see the triplets of toc lines indicating a do: from the config file.).

nzjrs commented 2 years ago

for the test today, 20220211_1244.toc.txt: video is fine again, but the chunk_producer_instance_n has some strange -106, -107 ... values, and the chunk_producer_playlist_n now looks like chunk_producer_instance_n

maybe instance and playlist variables were accidentally swapped somewhere

maybe there were additional files I should have pulled... ?

there are still 2 additional daq entries on the toc before other things (the other backends?) come online (can look at the fictrac_frame_num to see the triplets of toc lines indicating a do: from the config file.).

OK, this all seems to be working now.

chunk_producer_playlist_n is the index of the playlist item in the playlist and seems to be correct in the file uploaded. To be honest if you just look at the toc then you can use the identifier to map the item back to its entry in the playlist. The _n values are just because .h5 doesnt let you store strings (ideally the identifier) intelligently.

chunk_producer_instance_n being -100 and decreasing is expected. Its now code for these were called by play_item. As long as these are the same values in the .h5 then the toc fulfills its role of being able to find the start of the row for the toc entry (by the addressing tuple chunk_producer_playlist_n, chunk_producer_instance_n)

the extra toc entries are the duplicated experiment _start entries and should be easy to ignore or a quick fix for @davidt0x to solve. As I said, I wrote all this without testing as I'm away from my Windows PC.

nzjrs commented 2 years ago

(also additional duplicate toc entries called _silence are because annoyingly for technical reasons many backends always need to be playing / outputting something)

nzjrs commented 2 years ago

oh the meaning of some toc stanzas being CHUNK_producer_playlist_n vs producer_playlist_n was to indicate if the backend is chunked or not - but they otherwise should convey the same information (which is the index into the playlist definition of the item).

the chunk_producer_instance_n and producer_instance_n values don't actually have a meaning beyond they should be sufficiently unique to allow unambiguous finding of the start/correct row in the appropriate .h5 file for the toc entry.

mwangq commented 2 years ago

from the sync doc producer_playlist_n is the "running count of the number of stimulus items the backend playlist has played" whereas the producer_instance_n is the "numerical identifier of the instance of the underlying stimulus item"

you mention "the index into the playlist definition of the item" -- pretty sure that should be producer_instance_n, not producer_playlist_n (in your last comment) then?

it sounds to me like the chunk_producer_playlist_n and chunk_producer_instance_n values should be swapped... how do i do that...

mwangq commented 2 years ago

I thought (chunk or not) producer_instance_n would indicate a numerical identifier for the playlist item (e.g. [2, 2, 0, 0, 1]), and (chunk or not) producer_playlist_n would indicate the running count of the total number of stimulus items from a backend (e.g. [0, 1, 2, 3, 4]) (or for audio/daq: -105, -106, -107, -108, -109).

I am currently writing code to identify stim info:

how stim numerical identifier (e.g. 2) maps onto stim playlist name (e.g. "grating"). this is info that is ONLY available in the toc. currently, regardless of backend, I check the "producer_instance_n" field for this value. but in the current (see previous toc), we have instance_n and playlist_n swapped for audio/daq)
the fictrac frames corresponding to a given stim: to get this, I basically run. np.diff on both producer_instance_n and producer_playlist_n (in the .h5 sync files) to find the transition points between stimuli. if stimuli aren't repeated back-to-back then these are the same. if stimuli are repeated back-to-back then that's where the running count (producer_playlist_n as I understand it) is useful! which has now been fixed for the audio/daq (except saved into the "instance_n" field)

nzjrs commented 2 years ago

_producer_instance_n is to disambiguate in any combined .h5 file the case where two playlists are identical (one item named, 'foo') but one runs on daq and one runs on audio. _producer_instance_n will be unique for these two even though the first play through per backend of these two would have the same values of 'identifier' and producer_playlist_n

conceptually (because this is all dynamic) an analagous situation is loading a new playlist in while running that contains the same named item 'foo' but different data. In this case again, producer_playlist_n and identifier would be the same, but _producer_instance_n would be different

the meat of this bug is that also, conceptually in a dynamic experiment, calling (for example) play_item(foo) 10 times in a row was not saving enough information in the toc to disambiguate these were in fact 'foo' being output multiple times (a secondary consequence of the bug was the toc line being missing, but the primary cause was not handling this dynamic case like the other 're-user of things called the same thing and we cant store strings in h5 case'.

mwangq commented 2 years ago

in general, I understand the need for both a field (identifier of who was played) and a field (running count to distinguish start times of a stim played 10 times in a row). From the sync doc I linked above, the numerical 'identifier' for a specific item is supposed to be in instance_n. and the running_count is playlist_n.

from page 6 of the sync doc

Within the same backend (daq, audio, etc), producer_instance_n in the h5 files corresponds to item from the same row (record) in the TOC file, which corresponds to the identifier of the stimulus item in the backend playlist.

That is also how it is coded up in video_server and in the toc files I've sent! in either case, I think we should be consistent, and put the identifier (2 = 'grating') in instance_n, and the running count in playlist_n (as described in your sync doc) for audio and daq backends... or else, swap the fields in video_server, and ALSO update the sync doc to reflect the swap.

regarding your other comments - that's really interesting, i currently can't think of a situation where the experimenter would give items from two different backends with the same name... also flyVR2 doesn't have combined .h5 files anyway. i get separate daq/fic/sound/video server .h5 files. and in the toc file, the backend is listed, so even though both item names are 'foo', i know they are different items. :) (but I will never name two backends' items the same thing haha! :))

regarding your second para - also didn't know "loading a new playlist while running" was an option! I thought all playlist items were preset by the playlist section....

one thing I'm aware of, that Dudi and I noticed in January, is that in using a regular playlist (no time stanza, no experiment.py) it's absolutely necessary to name each item differently. If the items are listed with the same name, even though the fields (e.g. sine frequency) are different, flyVR will play the item based on the fields from the first time that item was named. anyway, that is how I found out that playlist name items have to be unique!

nzjrs commented 2 years ago

The test is the canonical description of the meaning of the fields. If there is a disagreement between to doc and the data or the test the test is correct and I have made a mistake writing the doc.

The explanation above was to illustrate potential degenerate cases students might end up with, not to suggest they make sense or are 'supported' in any sense.

I admit understanding the meaning of these fields is complicated and challenging in the dynamic experiment case. I suggest moving in increasing complexity from a static to a dynamic experiment and identifying issues along the way and filing new bugs - because (with the exception of multiple start messages) the bug at the core of this issue has been resolved.

mwangq commented 2 years ago

I see. Thanks very much for explaining, and for your help on this topic, @nzjrs !!

murthylab / fly-vr

Missing entry in TOC file for first video stimulus in playlist mode. #16