slub / dfg-viewer

The DFG Viewer is a free web service for browsing digitized books from remote library repositories in a rich and dynamic environment.
https://dfg-viewer.de
GNU General Public License v3.0
29 stars 26 forks source link

Update init `data.xml` file with new tenant data (presentation v5.0) #284

Open csidirop opened 3 months ago

csidirop commented 3 months ago

See https://github.com/slub/dfg-viewer/issues/283

Structures and Metadata are imported correctly. Entries now have the correct Data Format (see https://github.com/kitodo/kitodo-presentation/issues/1147) and only theirs.

But I just can't get the default namespaces to be imported as well: grafik

I manually added the old subtree, but it still don't works.

beatrycze-volk commented 3 months ago

I have few questions, I hope they can help you to fix the problem with data insertion.

  1. Manually added tx_dlf_format table has mixed order of uids
                        <table index="tx_dlf_formats" type="array">
                <rec index="3" type="array">
                    <uid>3</uid>
                    <pid>0</pid>
                    <title>ALTO</title>
                    <relations index="rels" type="array"></relations>
                    <softrefs type="array"></softrefs>
                </rec>
                <rec index="4" type="array">
                    <uid>4</uid>
                    <pid>0</pid>
                    <title>IIIF1</title>
                    <relations index="rels" type="array"></relations>
                    <softrefs type="array"></softrefs>
                </rec>
                <rec index="5" type="array">
                    <uid>5</uid>
                    <pid>0</pid>
                    <title>IIIF2</title>
                    <relations index="rels" type="array"></relations>
                    <softrefs type="array"></softrefs>
                </rec>
                <rec index="6" type="array">
                    <uid>6</uid>
                    <pid>0</pid>
                    <title>IIIF3</title>
                    <relations index="rels" type="array"></relations>
                    <softrefs type="array"></softrefs>
                </rec>
                <rec index="2" type="array">
                    <uid>2</uid>
                    <pid>0</pid>
                    <title>MODS</title>
                    <relations index="rels" type="array"></relations>
                    <softrefs type="array"></softrefs>
                </rec>
                <rec index="1" type="array">
                    <uid>1</uid>
                    <pid>0</pid>
                    <title>TEIHDR</title>
                    <relations index="rels" type="array"></relations>
                    <softrefs type="array"></softrefs>
                </rec>
            </table>

Do you think that it could help if they would be sorted 1, 2, 3, 4, 5, 6?

  1. tx_dlf_formats is only inserted in section records type="array" https://github.com/slub/dfg-viewer/blob/0eb137168d77cfdd4d0edaa8a8f033a8ff9baa3b/Initialisation/data.xml#L35 but missing in pid_lookup type="array" section https://github.com/slub/dfg-viewer/blob/0eb137168d77cfdd4d0edaa8a8f033a8ff9baa3b/Initialisation/data.xml#L3763 and is also missing in actual inserting of records to database https://github.com/slub/dfg-viewer/blob/0eb137168d77cfdd4d0edaa8a8f033a8ff9baa3b/Initialisation/data.xml#L4080

Could you also add it there?

  1. Is it possible that now with usage of EXTBASE the data is validated during import?

In this file are first inserted table rows of tx_dlf_metadata table and their are containing ids to tx_dlf_metadataformat rows which are not yet inserted. Additionally tx_dlf_metadataformat rows references rows from tx_dlf_formats which are not existing at all.

Ein frohes Osterfest! :)

csidirop commented 3 months ago

Hi, sorry I had no time to answer earlier.

It took me a bit to realize what the issue was, but you comments showed me what was missing and the right direction. Now should everything work again:

grafik

grafik

What I do not know is, why PHPStan is throwing an error now.

beatrycze-volk commented 3 months ago

It took me a bit to realize what the issue was, but you comments showed me what was missing and the right direction. Now should everything work again:

Great! :)

What I do not know is, why PHPStan is throwing an error now.

I think, you need to rebase your branch on top of the current master branch. It should fix the problem.

Edit: Or actually I need to fix it, as it is definition of the error which doesn't exist anymore in Kitodo.Production.

csidirop commented 3 months ago

What I do not know is, why PHPStan is throwing an error now.

I think, you need to rebase your branch on top of the current master branch. It should fix the problem.

Edit: Or actually I need to fix it, as it is definition of the error which doesn't exist anymore in Kitodo.Production.

Yes, I had the same error when requiring this branch by composer. Composer loads an older version of presentation without tha changes. Trying to open a document results an exactly that error Call to undefined method Kitodo\Dlf\Domain\Model\Document::getCurrentDocument(). But with the current master everything works as aspected

beatrycze-volk commented 3 months ago

I have removed this error message from configuration file. Now you can rebase your branch :)

csidirop commented 2 months ago

There is still an error unfortunately. Some documents throwing following exception while indexing:

Uncaught TYPO3 Exception Argument 1 passed to Kitodo\Dlf\Domain\Model\Document::setStructure() must be an instance of Kitodo\Dlf\Domain\Model\Structure, null given, called in /var/www/typo3/public/typo3conf/ext/dlf/Classes/Command/BaseCommand.php on line 227 thrown in file /var/www/typo3/public/typo3conf/ext/dlf/Classes/Domain/Model/Document.php in line 474

Need a bit more work here ...

csidirop commented 2 months ago

I am unsure why this issue occurs. When I initialize a new instance using a clean export (without the new tenant data), such as the file here cleanstate.xml.txt, and then add the new tenant data, everything works as expected.

However, if I use this export to initialize a new instance, everything appears to be working correctly at first, but then throws an error on some documents.

beatrycze-volk commented 2 months ago

I am unsure why this issue occurs. When I initialize a new instance using a clean export (without the new tenant data), such as the file here cleanstate.xml.txt, and then add the new tenant data, everything works as expected.

However, if I use this export to initialize a new instance, everything appears to be working correctly at first, but then throws an error on some documents.

You mean this error with empty structure? If yes, then probably some of the documents are missing the structures. Adding one to each of the document should solve the problem.

csidirop commented 1 month ago

First sorry for the late response! And second sorry, sometimes I express myself really unclearly. What I ment:

But I looked now deeper into it and found following: Some structure types have a tailing zero as index_name (eg. monograph0). Presumably to have unique index name. If the indexname with tailing zero is representing the German translation, the indexing aborts. Otherwise there is no issue. Manually renaming those fixes the issue.

Here a screenshot: grafik In this example the type letter would throw an exception but document wouldn't.

Setting up the New Tenant manually does not produce entries with tailing zeros: grafik

~I remember that we added tailing zeros an some point in presentation. Why again? Uniqueness?~ (Edit: we did not. I made a proposal for the metadata) Are those missing in the first place and TYPO3 is adding them when importing the xml file? (I don't remember how the init xml is handelt)

beatrycze-volk commented 1 month ago

But I looked now deeper into it and found following: Some structure types have a tailing zero as index_name (eg. monograph0). Presumably to have unique index name. If the indexname with tailing zero is representing the German translation, the indexing aborts. Otherwise there is no issue. Manually renaming those fixes the issue.

In this example the type letter would throw an exception but document wouldn't.

I have checked out the data.xml file to see how looks those two structures there.

Letter

Parent structure

Translation structure

<tablerow index="tx_dlf_structures:150" type="array">
    <fieldlist index="data" type="array">
        <field index="uid" type="integer">150</field>
        ...
        <field index="sys_language_uid" type="integer">0</field>
        <field index="l18n_parent" type="integer">0</field>
        <field index="l18n_diffsource">a:7:{s:8:&quot;toplevel&quot;;N;s:5:&quot;label&quot;;N;s:10:&quot;index_name&quot;;N;s:8:&quot;oai_name&quot;;N;s:16:&quot;sys_language_uid&quot;;N;s:6:&quot;hidden&quot;;N;s:6:&quot;status&quot;;N;}</field>
        ...
        <field index="label">Brief</field>
        <field index="index_name">letter</field>
        ...
    </fieldlist>
    <related index="rels" type="array"></related>
</tablerow>
<tablerow index="tx_dlf_structures:80" type="array">
    <fieldlist index="data" type="array">
        <field index="uid" type="integer">80</field>
        ...
        <field index="sys_language_uid" type="integer">1</field>
        <field index="l18n_parent" type="integer">150</field>
        <field index="l18n_diffsource">a:2:{s:16:&quot;sys_language_uid&quot;;N;s:11:&quot;l18n_parent&quot;;N;}</field>
        ...
        <field index="label">Letter</field>
        <field index="index_name">letter0</field>
        ...
    </fieldlist>
    <related index="rels" type="array">
        ...
    </related>
</tablerow>

Document

Parent structure

Translation structure

<tablerow index="tx_dlf_structures:129" type="array">
    <fieldlist index="data" type="array">
        <field index="uid" type="integer">129</field>
        ...
        <field index="sys_language_uid" type="integer">0</field>
        <field index="l18n_parent" type="integer">0</field>
        <field index="l18n_diffsource">a:8:{s:8:&quot;toplevel&quot;;N;s:5:&quot;label&quot;;N;s:10:&quot;index_name&quot;;N;s:8:&quot;oai_name&quot;;N;s:9:&quot;thumbnail&quot;;N;s:16:&quot;sys_language_uid&quot;;N;s:6:&quot;hidden&quot;;N;s:6:&quot;status&quot;;N;}</field>
        ...
        <field index="label">Dokument</field>
        <field index="index_name">document</field>
        ...
    </fieldlist>
    <related index="rels" type="array"></related>
</tablerow>
<tablerow index="tx_dlf_structures:131" type="array">
    <fieldlist index="data" type="array">
        <field index="uid" type="integer">131</field>
        ...
        <field index="sys_language_uid" type="integer">1</field>
        <field index="l18n_parent" type="integer">129</field>
        <field index="l18n_diffsource">a:2:{s:16:&quot;sys_language_uid&quot;;N;s:11:&quot;l18n_parent&quot;;N;}</field>
        ...
        <field index="label">Document</field>
        <field index="index_name">document0</field>
        ...
    </fieldlist>
    <related index="rels" type="array">
        ...
    </related>
</tablerow>

At first it looks quiet ok for both structures, but then I have noticed one difference between them. For document uid of the parent is 129 and then translation 131. It makes totally sens, the parent is saved and after that the translation. But for letter is the other way round. The parent has uid 150 and the translation 80.

I have observed similar dependency for other structures.

I'm not sure if it really matters but maybe the records are saved one after another and it causes the trouble here. What do you think?

csidirop commented 1 month ago

Thank you for pointing me to the right direction!

The issue was again, that the index_name had to be unique but wasn't! I fixed this for metadata (https://github.com/kitodo/kitodo-presentation/pull/1186) but not for structures (which I have done now: https://github.com/kitodo/kitodo-presentation/pull/1211).

With the new export the metadata and structures are getting set up correct: grafik

Indexing works again for all my test documents worked again. I hope that its done now :D

beatrycze-volk commented 1 month ago

Thanks for your work :)

Do you think that it will still work properly with changes from https://github.com/kitodo/kitodo-presentation/pull/1212 ?

csidirop commented 1 month ago

So at first everything looked good, but after exporting the data TYPO3 again added tailing zeros to the metadata and structure index_names. Setting up DFG-Viewer resulted in the same mess: grafik

We are thinking that TYPO3 has some issue with the index_name. According to the database column (ext_tables.sql#L69 or ext_tables.sql#L97), we know that the value is not UNIQUE but a KEY. Maybe the latter is messing with TYPO3.

Because in addition this, if we move an entry in the backend TYPO3 will again add a tailing zero: grafik Like it says The value ... has been changed ... as it is required to be unique.

So we could ignore this issue, hope no one moves the entries and just delete the zeros from the init file, or look deeper into it. What was the problem with my solution again, that we append the language_id to the index_name?

beatrycze-volk commented 1 month ago

We are thinking that TYPO3 has some issue with the index_name. According to the database column (ext_tables.sql#L69 or ext_tables.sql#L97), we know that the value is not UNIQUE but a KEY. Maybe the latter is messing with TYPO3.

Yes, it looks like some problem inside the TYPO3... I would say, we leave it as it is.

So we could ignore this issue, hope no one moves the entries and just delete the zeros from the init file, or look deeper into it. What was the problem with my solution again, that we append the language_id to the index_name?

It was not a problem with your solution. It was problem with the way how Kitodo inserts data in backend controller (using Extbase instead of DataHandler). It was causing a lot of mess and confusion.

Eventually we could try still one approach. If it is possible, we could remove index_name field for the translated records and see if then insertion works correctly. If not, then we just leave it in the current state and I merge it. What do you think?

csidirop commented 1 month ago

Eventually we could try still one approach. If it is possible, we could remove index_name field for the translated records and see if then insertion works correctly. If not, then we just leave it in the current state and I merge it. What do you think?

I had the same thought :D lets try it

beatrycze-volk commented 2 weeks ago

Eventually we could try still one approach. If it is possible, we could remove index_name field for the translated records and see if then insertion works correctly. If not, then we just leave it in the current state and I merge it. What do you think?

I had the same thought :D lets try it

How looks situation here?

csidirop commented 1 week ago

Good question. I cant remember where I stopped before the BiblioCon. I'll look at it again during the day.