Problem with not responding frontend when certain DPU is logging more extensively

tomas-knap commented 10 years ago

This is not a task solely for Buhuslav, but a joint effort is needed (@skodape @janvojt):

Problem: When certain DPU is logging quite extensively, the whole GUI is unusable. When opening exec detail of a running pipeline, it loads and loads, but it takes really long before something happens. For other users, the GUI is not usable at all.

It is urgent, because it was discovered by Jakub as he was trying the app and he said this must be solved in order to have defensible result.

This problem was originally found as ARES-test pipeline was run (ODCS), but happens also for other pipelines, e.g. ARES 9000.

Solutions: 1.) Try mysql if that helps (trying right now) 2.) offload detailed logs to a file 3.) tune the SQL appender logging to the database (transaction isolation)

@skodape @janvojt Please discuss.

skodapetr commented 10 years ago

4) try some background data loading .. But this would require more then some background data loading for ROC's data sources ... so I would left this as the last possible enhancement/solution.

bogo777 commented 10 years ago

So the problem is that the database is flooded with data from logging and it takes long to handle requests from frontend?

Only thing that I think could be improved in the frontend is to check whether the last refresh is finished and if not, simply not start another until it is. This could be easily realized for the "isChanged" queries which check for new/modified data, for refreshing the actual data it propably won't be so easy.

But I think the main problem is that propably single DPU floods the database...

skodapetr commented 10 years ago

Tak jsem se na to díval .. chtěl jsem se podívat na logy z SqlAppenderu .. jak loguje a vytěžuje DB (neb loguje po kolika lozích ukládal do DB a jak dlouho mu to trvalo). Jenže jsem žádné logy nenašel .. tedy jsem to pustil na localu a tam taky žádné nejsou.

Loguje se totiž v části kde se logy ukládají dávkově. Tedy jsem s zjistil .. že se aktuálně logy ukládají do DB po jednom ...

Proč se loguje po jednom a nikoliv po více .. v appenderu je následující kód.

        // prepare and start the connection source
        connectionSource = new LoggingConnectionSource(dataSource);
        connectionSource.setContext(this.getContext());
        connectionSource.start();

        // get information about the source
        supportsBatchUpdates = connectionSource.supportsBatchUpdates();

Pokud je supportsBatchUpdates == true tak se loguje po dávkách po 50 nebo jednou za 4,3 vteřiny. Bohužel .. nevím od jaké doby ale tohle true není .. nýbrž false. Tedy se pro každý log se získá připojení do DB .. vytvoří insert, updatne a připojení se zavře.

Tedy tohle bude asi dotaz na @janvojt, jestli neví kdy došlo ke změně v příslušné vrstvě a jestli jsme schopni tuto změnu "vrátit".

Do té doby jsou podle mě veškeré další pokusy zbytečné (jak psal Bohuslav), neb tímhle asi dost odrovnáváme DB. Navíc se to shoduje s pozorováním, že situace je horší než byla kdy "dříve".

tomas-knap commented 10 years ago

@janvojt Honzo, koukni na to prosim co nejdriv.

Je potreba, aby to logovalo po tech davkach pres sdilene connection

skodapetr commented 10 years ago

Well ... it looks like a commit: https://github.com/mff-uk/ODCS/commit/4ba33bb890e417f7f7c2acb577180ded3df5b143 still works, while in the next commit: https://github.com/mff-uk/ODCS/commit/e0369f4e035b72dcfcea11f571e43274e5689811 we get supportsBatchUpdates == false.

So the problem is probably somehow connected with addition of mysql functionality.

tomas-knap commented 10 years ago

Actually there are three issues at the moment:

1) logger which inserts logs via dumps not working (see above). It was working before @janvojt merged mysql branch to master. 2) Query on count of records for the given exec takes quite a lot of time (0.4s for 1mil records). Could we solve that by having an extra table/column with counts of logs for each execution, which is updated via a trigger? 3) When examining query of type (get 6 logs at the position 10,1000,40000):

 select id AS a1, DPU AS a2, EXECUTION AS a3, logLevel AS a4, message AS a5, logger AS a6, stack_trace AS a7, timestmp AS a8 FROM logging WHERE (EXECUTION = 9) limit 10,6;

for limit 1000,6: 0.01s for 40000,6: 0.15s But do not know if that can be improved easily

tomas-knap commented 10 years ago

We may consider: https://dev.mysql.com/doc/refman/5.5/en/insert-delayed.html

tomas-knap commented 10 years ago

ad 3) EXPLAIN gives:

 id | select_type | table   | type | possible_keys       | key                 | key_len | ref   | rows    | filtered | Extra |
+----+-------------+---------+------+---------------------+---------------------+---------+-------+---------+----------+-------+
|  1 | SIMPLE      | logging | ref  | ix_LOGGIN_execution | ix_LOGGIN_execution | 5       | const | 1944269 |   100.00 | NULL

skodapetr commented 10 years ago

Jak se zdá, tak pokud je použitý LIMIT tak MySql dělá fullscan na výsledek předchozího WHERE ..tedy dotaz na konec logů trvá i 8 vteřin ..

Nicméně ze zkoumání logů jsem zjistil, že by šlo vylepšit pár věcí:

1) setExecution je volán 2x po sobě .. každé jeho volání trvá 8 vteřin Tedy jednou ho vyhodit, @bogo777 stačilo by na začátek volání přidat podmínku, která ověří, že se mění execution? (pokud ne tak rovnou skončit) - nebo by tato úprava byla složitější?

~~2) pokud je pipeline dokončená, nezobrazovat poslední stránku logů ale první - to odstraní LIMIT do velkých hodnot, pokud uživatel chce vidět poslední logy tak si hold počká .. nicméně pro doběhlé pipeline jsou asi zajímavější spíše event\messages 2.1) pro běžící pipeline se dále ukazuje poslední stránka~~

3) Tabulka se vždy zeptá na první řádek na své stránce a pak na data jenž použije. Pokud první řádek není reálně použit ve výsledku, bylo by možné vrátit nějakou dummy hodnotu a ušetřit opět LIMIT. Druhou možností je rovnou včetně prvního řádku získat i další data.

4) Při sledování logů online může načítání stále trvat dlouho. Z toho důvodu by bylo dobré, aby uživatel mohl změnit dobu za jakou se automaticky refreshuje (aby aplikace alespoň nějaký čas reagovala)

5) Očíslovat logy v každé execution. V tu chvíli by nebylo třeba volat LIMIT ale tato podmínka by se přenesla do WHERE.

@bogo777 Po diskusi s @tomas-knap jsme došli k závěru, že bod 5) by měl sám o sobě přinést slušné výsledky. Nicméně z původního seznamu 1) a 3) by se stejně mohli provést.

tomas-knap commented 10 years ago

To co pise Petr, by melo podstatne zrychlit ten dotaz na LIMIT M,N. Nicmene porad tam bude ten dotaz na count(). Ktery trebas pro db se 4miliony records a dotazu na execution, ktera ma 1 milion records trva pres vterinu. Ty dotazy na count by minimalne sly ale cachovat, tj. ze kdyz se nekdo podruhe zepta na count u toho sameho exec, ktere uz dobehlo, tak se vezme z cache. To cachovani bych nechal na fw.

tomas-knap commented 10 years ago

@janvojt Jen pro zajimavost. Kdyz dam dotaz na pocet zaznamu v tabulce logging, tak to jako key pouzije index ix_LOGGING_dpu a ne index na primarnim klici. Nejake vysvetleni?

mysql> explain SELECT COUNT(*) FROM logging ;
+----+-------------+---------+-------+---------------+----------------+---------+------+---------+-------------+
| id | select_type | table   | type  | possible_keys | key            | key_len | ref  | rows    | Extra       |
+----+-------------+---------+-------+---------------+----------------+---------+------+---------+-------------+
|  1 | SIMPLE      | logging | index | NULL          | ix_LOGGING_dpu | 5       | NULL | 3888538 | Using index |
+----+-------------+---------+-------+---------------+----------------+---------+------+---------+-------------+
1 row in set (0.00 sec)

skodapetr commented 10 years ago

Není řidší? Primární index totiž bude jistě větší. Tenhle index by měl být z těch 3 co tam jsou nejmenší co do rozsahu hodnot.

janvojt commented 10 years ago

I fixed supportsBatchUpdates = true by previous commit.

Will look into LIMIT for MySQL.

tomas-knap commented 10 years ago

Ok, diky, u toho limitu to lze vyresit imho jen tim prevodem na WHERE podminku

tomas-knap commented 10 years ago

Sjednotte se s @skodape , at se v nedeli potkate s temi upravami

skodapetr commented 10 years ago

@janvojt Nejsem si jistý co znamená reassign a pak solving tag bez komentáře, nějak konkrétněji ..

Každopádně co se týče toho co bylo na mě (alespoň původně) tak jsem se pokusil o použití WHERE místo LIMIT (ukázalo se těžší, než jsem čekal .. ) nicméně pokud máš jiné řešení už hotové tak můžeme vybrat to lepší - a upřímně bych byl docela zvědavý na další řešení, neb já jsem s ničím moc pěkným a nezávislým na eclipse linku přijít nedokázal. Tedy do by bylo k bodu číslo 5.

Pořád tam zůstává 1, 3, 4. Já dneska už nic dělat nebudu a zítra dopoledne asi také ne ... tedy až odpoledne se k tomu dostanu asi. Pokud si chceš něco zabrat tak buď prosím konkrétnější.

PS: Nechci vás z toho s Bohuslavem nějak stranit, ale přeci jen já jinak nemám žádné urget\high issues (dokumentace se nepočítá .. )

janvojt commented 10 years ago

Pouzitie WHERE, cislovanie a podobne je blbost, to ani neskusaj, tym sa nikam nedostanes. Ja som uz na odcs-test zmenil db engine pre tabulku logging na MyISAM, co vyzera ze je tak 3 krat efektivnejsie. Zmenil som este bigint na int, takze index je mensi. Zajtra to nacommitim do schema.sql. Este mozeme skusit table partitioning, co sa hodi na tento typ tabulky a mohlo by to pomoct. Este to ukladanie count by to mohlo zefektivnit, ale otazka je ci to musi byt thread-safe. On Feb 21, 2014 10:38 PM, "Petr Škoda" notifications@github.com wrote:

@janvojt https://github.com/janvojt Nejsem si jistý co znamená reassign a pak solving tag bez komentáře, nějak konkrétněji ..

Každopádně co se týče toho co bylo na mě (alespoň původně) tak jsem se pokusil o použití WHERE místo LIMIT (ukázalo se těžší, než jsem čekal .. ) nicméně pokud máš jiné řešení už hotové tak můžeme vybrat to lepší - a upřímně bych byl docela zvědavý na další řešení, neb já jsem s ničím moc pěkným a nezávislým na eclipse linku přijít nedokázal. Tedy do by bylo k bodu číslo 5.

Pořád tam zůstává 1, 3, 4. Já dneska už nic dělat nebudu a zítra dopoledne asi také ne ... tedy až odpoledne se k tomu dostanu asi. Pokud si chceš něco zabrat tak buď prosím konkrétnější.

PS: Nechci vás z toho s Bohuslavem nějak stranit, ale přeci jen já jinak nemám žádné urget\high issues (dokumentace se nepočítá .. )

Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/ODCS/issues/1238#issuecomment-35776392 .

tomas-knap commented 10 years ago

Problem je, ze to potrebujem zefektivnit ne 3x, ale tak o dva rady nejmene. MyISAM urcite lepsi pro tyhle ucely bude. To Where mi prijde jako rozumna varianta, dotazy select where X =>< jsou s indexy velmi rychle a nezhorsuje se to linearne s rostouci tabulkou. Kdyztak prosim napis neco k tomu table partitioning, pripadne proc nejsi pro to where

janvojt commented 10 years ago

Hmmm, mas pravdu, znie to sice neuveritelne, ale vyzera to ze MySQL optimizer je docela blby a robi early row lookups pre cely offset.

2014-02-21 23:42 GMT+01:00 Tomas Knap notifications@github.com:

Problem je, ze to potrebujem zefektivnit ne 3x, ale tak o dva rady nejmene. MyISAM urcite lepsi pro tyhle ucely bude. To Where mi prijde jako rozumna varianta, dotazy select where X =>< jsou s indexy velmi rychle a nezhorsuje se to linearne s rostouci tabulkou.

Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/ODCS/issues/1238#issuecomment-35781665 .

skodapetr commented 10 years ago

@janvojt Pozdě, taková změna je prostě moc lákavá. Navíc, jsem čekal, že z povahy tabulku už bude pro tuto skutečnost "upravená" neb problém s pomalostí logů zde není na pořadu první den.

Nicméně moje změny nejsou nějak silně rozsáhlé, tedy pokud najdeme rychlé řešení bez toho tak je otázkou chvíle je odstranit - přeci jen hezké nastavení DB je lepší než úprava příkazu v dao vrstvě.

@tomas-knap Možná by jsi mohl Honzovi dát přístup na MySql na odcs, aby to tammohl rovnou otestovat. Je na to třeba nějaký ten milion logů z jednoho DPU a dívat se na třeba 20 posledních.

janvojt commented 10 years ago

IMHO ale velky problem je aj loadovanie dat do table. Nato by mali stacit 2 dotazy - jeden count + select na data. Z logu http://pastebin.com/8MGjXFwy ale vidim, ze sa dotazujeme na tabulku logging celkom 15 krat...

Viem, ze kvoli vaadinu a datatypov tam bol este dotaz navyse na prvy zaznam. Ale stale by to mali byt len 3 dotazy. Takze si myslim ze je to bud chyba ROC, do ktoreho vidi najviac Petr, alebo sa ROC nespravne pouziva a volaju sa tam veci viackrat zbytocne, co myslim robil Bohuslav.

P.S. data a pristup do mysql na odcs.xrg mam.

2014-02-22 13:11 GMT+01:00 Petr Škoda notifications@github.com:

@janvojt https://github.com/janvojt Pozdě, taková změna je prostě moc lákavá. Navíc, jsem čekal, že z povahy tabulku už bude pro tuto skutečnost "upravená" neb problém s pomalostí logů zde není na pořadu první den.

Nicméně moje změny nejsou nějak silně rozsáhlé, tedy pokud najdeme rychlé řešení bez toho tak je otázkou chvíle je odstranit - přeci jen hezké nastavení DB je lepší než úprava příkazu v dao vrstvě.

@tomas-knap https://github.com/tomas-knap Možná by jsi mohl Honzovi dát přístup na MySql na odcs, aby to tammohl rovnou otestovat. Je na to třeba nějaký ten milion logů z jednoho DPU a dívat se na třeba 20 posledních.

Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/ODCS/issues/1238#issuecomment-35801153 .

skodapetr commented 10 years ago

Link mi nejde :( nicméně by mě to velice zajímalo .. neb já u sebe napočítal těch dotazů kolem 4 - 5.

Jistou duplicitu popisuje bod 2.1) tedy volani setExecution. Další možnost ušetření pak 2.3 (tuším) ale i tak je třeba dvou dotazů .. a pokud dotaz trvá 8 vteřin, tak je to pořád moc.

janvojt commented 10 years ago

link som opravil

skodapetr commented 10 years ago

Jo to více méně zrcadlí co jsem psal v těch dalších bodech.

Něco je z toho jak Bohuslav nastavuje filtry a opakovaně něco volá (je to jeden z těch bodů co jsem psal) něco z toho jak ta table (addon) přistupuje k datům. Na druhou stranu Bohuslav má dost dalších issue (viz. to tvoje poslední) tohle se teoreticky neprojeví pokud tam není dost logů.

Ale i tak .. pokud budeš mít 3 dotazy a budou trvat 0,5s, 8s, 8s .. tak jsme pořád dost mimo. Proto jsme zkoušeli ten WHERE (i když samosebou to není záchrana), nicméně jestli to pomohlo se nedovíme dokud to Tomáš nenasadí.

skodapetr commented 10 years ago

Ještě jak starou máš verzi? Neb když to počítám u sebe na locálu tak dostanu: 2x3 ( tj. 2x volání set execution) + 1 (načtení dat) na zobrazení dialogu.

Navíc se mi zdá, že tam počítáš nejen to první zobrazení, ale i ještě jedno kliknutí aby se ti tam zobrazili správně ty záložky. Tj. nikoliv jen rozsah jednoho zobrazení dialogu tj. od DebuggingView - initialize() do DebuggingView - initialize() -> done ..

Ale když se to veme 2x tak je pravda, že to vyjde takhle takhle kolem těch 15

bogo777 commented 10 years ago

@skodape To 2x volání setExecution() myslíš volání z initialize() v DebuggingView a pak přímo setExecution() z ViewImpl...nebo jsem to přehlédl ještě někde jinde?

skodapetr commented 10 years ago

@bogo777 Jj myslím tohle. Napadlo mě na začátku setExecutino si pamatovat poslední id a pokud je stejné tak jen vrátit return. Ale nejsem si jistý jestli to něco napokazí.

ghost commented 10 years ago

I have read all the way up here :) When I was starting my work on the project, I have asked @skodape why the project is divided into two separated applications - backend and frontend. Answer was, something like, when the hard computations take place on the backend, the frontend should not be influenced, nor become unresponsive. Well, here we are. Project divided into two separated applications and frontend can become unresponsive. I think there may be other issues with this design too. (How do we handle user editing pipeline which is just above to start on the backend?)

Back to the topic: Logging into database is bad idea either!

I can not find any reason why would we need logging into database. Is any user willing to see Page 5/2148 from logfile of the pipeline run? Or is anybody wiling to search all Pipeline runs, where the logging level was X and message contained string "Whohoho" and timestamp was between X and Y?

The purpose of SQL database is to provide searching into the table using WHERE. Is anybody interested in such searching above logfiles? I think nobody wants this. Main use of the logfile is:

search for keywords (grep-ing for errors, exceptions, class name, log level)
search for timestamp or time around some timestamp (finding what happened at a certain time, especially when someone reports problem that happened "last night around 1am")

In both usages, we fail.

Search for keyword - shows filtering for that rows, that is something like grep-ing the logfile. But wait, no context is shown. With grep, one usually use -B 10 -A 10 options to show at least context of the message itself. Or when browsing logfile using text editor, one see context of the found message immediately. Our searching (better named filtering) is useless.
Search for timestamp is annoying, time consuming. Who wants to use datepickers...

Since there are plenty of, well suited tools for browsing logifles, it does not make sense to try to develop one another in our browser frontend. For DPU developers who deploy DPUs on localhost/LANhost to test them it would be more useful to have quick access to the logfiles on the disk. They are used to it, know what to do with them, have set up correct and favorite tooling to browse log files...

In first approach, let us give them "download link" for a text/plain logfile resource. They can browse the logfile in the browser or download it (and instruct browser to use reasonable save location).

In the second run, we can improve access to logfiles by providing some 'disk'-like interface. Something, that DPU developer will see as a folder on his operating system. This is no problem when DPU developer works on localhost instance, nor LAN-host (samba, ssh, etc). But when the ODCS is on remote host, desperate developer finally wish he had ssh access to the host to find the log on the disk and examinate (especially when the file is large - 200MB or so).

So what is the purpose of showing log file in the browser? I think it is only useful to see the progress, without downloading the log file. And for that purpose, we need only something like 'tail' functionality. No paging, no searching/filtering. Once the browser loads more and more logfile content, user always have his 'ctrl-f' to search over it. He can copy&paste into favorite log file colorized text file editor to examinate the log file. When the DPU execution stops, we will provide download link. Thats it.

And yes, we do not need the SQL database on this, so I propose: Lets get rid of logging into database! My proposal at the time is:

log to files on the backend, unique filename for each execution (use database sequence generated number incorporated into filename). save the logfile relative location on the disk into the database next to execution entity.
provide download link on frontend when the execution is done
well logfile is on backend, we need to transfer it to frontend and then to the client. use Java RMI or Spring Remoting with rmiio library for this.
Create "onDemandAppender" for logback, when user opens the log view screen (tail -f screen) in browser, frontend RMI calls backend to obtain (Remote)InputStream of logfile. The appender is created on backend and attached to pipelineexecution. Appender uses java PipedOutputStream to write to (Remote)InputStream. So the logging is using two appenders - one default file-based and one to show the tail -f to the user. When user exits the tail-f screen, streams are closed and appender is 'self-removed' as it got its outputstream closed.

Once we have RMI or Spring Remoting in place, we can also get rid of plain TCP messaging between frontend and backend (check DB message, ping are you alive message).

tomas-knap commented 10 years ago

Michal,

we will test the current solution (improvement we did in last days), which replaces the ineffective LIMIT/OFFSET queries to WHERE queries, for which indexes may be effectively used and in that way we increased the speed of loading queries substantially. If that will not help, we have to put the logs into files only.

It also depends whether the DPU is logging reasonably or not. If the pipeline is run in normal mode, only logs INFO+ are stored. If the DPU contains millions of such logs, something is wrong.

Since we log for each pipeline execution events and logs, we should for sure keep the events in the database (which should contain basic information about the pipeline exec) and we could keep the detailed log aside in a file with the tail -f functionality, as you also wrote.

janvojt commented 10 years ago

When I was starting my work on the project, I have asked @skodape why the project is divided into two separated applications - backend and frontend. Answer was, something like, when the hard computations take place on the backend, the frontend should not be influenced, nor become unresponsive.

I believe separating frontend and backend provides more flexibility and better design. With this setup it should be possible to have multiple (possibly different) frontend applications (which I actually never tested, so I am not sure it works at the moment). Originally, when we were specifying requirements for the project, we were considering a separate frontend for administrator for browsing data.

Well, here we are. Project divided into two separated applications and frontend can become unresponsive.

This is obviously true. Just by separating application with performance issues into 2 applications will always give you performance issues with at least 1 application. The reason of this issue does not have anything to do with backend. The problem was, that MySQL did early row lookups, which is now resolved by replacing LIMIT clause with WHERE clause.

I think there may be other issues with this design too. (How do we handle user editing pipeline which is just above to start on the backend?)

I am not sure I follow your question here - is transaction isolation the answer? How would this be solved by having just 1 application? (don't forget about multi-user access and scheduling)

Logging into database is bad idea either!

In general, I agree.

Is any user willing to see Page 5/2148 from logfile of the pipeline run?

Well, I also agree here. I actually think we should log into database only when pipeline is running in debug mode. Otherwise we should be logging to files (or log to DB only error+ logs). But from implementation point of view this makes no difference.

In both usages, we fail.

Both points are a question of user interface. If we added an input box for searching where as a result we would be navigated to relevant page, your first point would be invalid.

For DPU developers who deploy DPUs on localhost/LANhost to test them it would be more useful to have quick access to the logfiles on the disk.

What about developers using remote host with no ssh access?

In first approach, let us give them "download link" for a text/plain logfile resource. They can browse the logfile in the browser or download it (and instruct browser to use reasonable save location).

This does not sound user-friendly at all. Why would I download the whole log file, when I may be interested only in the last 10 log lines relevant to the error causing pipeline failure? And in current solution, I can do this with just 2 clicks.

In the second run, we can improve access to logfiles by providing some 'disk'-like interface.

Sounds pretty complicated. How would you do this? What about permissions? Security? Performance when being on slow connection?

we can also get rid of plain TCP messaging between frontend and backend

I agree that current communication between frontend and backend is primitive. Spring Remoting would be much more flexible. However this is a different topic.

ghost commented 10 years ago

I think there may be other issues with this design too. (How do we handle user editing pipeline which is just above to start on the backend?)

I am not sure I follow your question here - is transaction isolation the answer? How would this be solved by having just 1 application? (don't forget about multi-user access and scheduling)

It depends. PipelineGraph > List is fetchType EAGER, but Note > DPUInstanceRecord is fetched LAZY. Consider user editing pipeline and backend starting up execution of that pipeline. Can user save changes to DPUInstanceRecord after execution of the pipeline started, but before it was fetched in backend by that execution? Probably yes. It is not problem of transaction isolation, but of "what it consistent and when". As a user, I would think that changing pipeline while running will save changes only for future runs. Or am I wrong?

In both usages, we fail.

Both points are a question of user interface. If we added an input box for searching where as a result we would be navigated to relevant page, your first point would be invalid.

:) at this time, i am unable to find any text file ajax-viewers. but that would ideal solution.

For DPU developers who deploy DPUs on localhost/LANhost to test them it would be more useful to have quick access to the logfiles on the disk.

What about developers using remote host with no ssh access?

That was only a statement that people who run it locally or on SSH-accessible machine, will probably use console to read log files. Of course that for people who do not have remote access to shell, we must elaborate something useful.

In first approach, let us give them "download link" for a text/plain logfile resource. They can browse the logfile in the browser or download it (and instruct browser to use reasonable save location).

This does not sound user-friendly at all. Why would I download the whole log file, when I may be interested only in the last 10 log lines relevant to the error causing pipeline failure? And in current solution, I can do this with just 2 clicks.

Well, yes. I have very precise idea how log files should be displayed using AJAX, but seems like nobody has developed it already. Weird :)

In the second run, we can improve access to logfiles by providing some 'disk'-like interface.

Sounds pretty complicated. How would you do this? What about permissions? Security? Performance when being on slow connection?

Readonly of course. WebDAV is the first what I can think of. Disk-like access is always "by-chunks". When you mount NFS volume, it does not get downloaded immediately. But now I can imagine perfect AJAX viewer, so implementing disk-like things... well, forget it. You are right. It has to be in browser or nothing. Nobody will ever want to use such interface.

janvojt commented 10 years ago

It depends. PipelineGraph > List is fetchType EAGER, but Note > DPUInstanceRecord is fetched LAZY. Consider user editing pipeline and backend starting up execution of that pipeline. Can user save changes to DPUInstanceRecord after execution of the pipeline started, but before it was fetched in backend by that execution? Probably yes. It is not problem of transaction isolation, but of "what it consistent and when". As a user, I would think that changing pipeline while running will save changes only for future runs. Or am I wrong?

I have not tested this, but I expect it currently works with the following mindset:

When pipeline is run, its structure and setup is always preserved as was at the time it was started.
Propagation of changes on DPUs is undefined, as according to JPA FetchType.LAZY is only a hint, not a requirement. Therefore, either DPU state from when pipeline was run may be used, or current updated state may be used. However, in practice, DPU state will most likely be from the moment when it is run - same as is the case with pipeline.

(Note that in case of backend crash obviously only the current pipeline state is recovered and pipeline is resumed as it is read from DB -> which may cause quite unexpected behavior)

mff-uk / odcs

Problem with not responding frontend when certain DPU is logging more extensively #1238