[hibernate search] Introduce a common interface for DTO objects and data beans

kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.

http://www.kitodo.org/software/kitodoproduction/

GNU General Public License v3.0

64 stars 63 forks source link

[hibernate search] Introduce a common interface for DTO objects and data beans #6032

Closed matthias-ronge closed 2 months ago

matthias-ronge commented 7 months ago

Issue #5760 1c) -- part 1

Follow-up pull request to #5784 (immediate diff)

matthias-ronge commented 5 months ago

Yes, the DTOs shall be deleted, and many more files will be deleted later, but to reduce the pull requests’ size, I didn’t delete them yet. There will be deleted in a separate pull request. Yes, it is possible to delete the interface completely then, which would mean to move the Javadoc comments to the beans and remove the interface. This is a decision of our own taste as developers. First, introducing the interface was helpful for me to see and get the beans and DTOs really work equally (they did for 96%, but there were inconsitencies). So if there will be a majority for removing the interface later, I can do that. But it should happen after the other removals have happened.

henning-gerhardt commented 5 months ago

I don't think that is a good idea to access the Hibernate data directly in the UI - from a security view. There was many exploits in the used UI frameworks, bad written UI code and other mistakes which made it possible to access the database through the used ORM like Hibernate and UI to access (read / write / delete) the database data. From this point I would not recommend the way to expose the Hibernate objects to the UI and use a wrapping mechanism (like used POJO's / DTO's right now) to made the database data available in the UI. So removing them is a bad idea from my view of security to web applications.

matthias-ronge commented 5 months ago

You were right about what you say, but unluckily, this happens anyway in Kitodo.Production, and the DTO objects here have a different meaning. They were the database objects, when loaded from index, for filling the tables. Removing the index objects when deleting ist straight foreward.

matthias-ronge commented 4 months ago

@stweil Generell stimme ich deiner Meinung zu. Allerdings ist diese Entwicklung zu groß, um Commits nur in dem Moment durchzuführen, in dem der Code funktioniert. Zum Review schlage ich vor, dass du die Commits ignorierst und nur den resultierenden Code gegenliest. Vielleicht ist jeder der Pull Requests für die Hibernate-Search-Entwicklung, so wie du es denkst, „ein Commit“.

Es sollte möglich sein, den Branch auszuchecken und auszuführen. Jeder Pull Request löst eine bestimmte Aufgabe, dieser hier führt „nur“ die Schnittstellenschicht ein. Es sollten also beim Ausführen des Codes keine funktionalen Unterschiede zu erkennen sein.

solth commented 3 months ago

I understand @henning-gerhardt's concern but agree with @matthias-ronge that the DTOs in Kitodo.Production never really offered any additional security. If I understand the concept of DTOs correctly they are actually not really intended as a security factor anyway, but rather to keep a low data profile when transfering objects between a client and a server in a network is expensive. That is how they are described on Wikipedia and by their original designer Martin Fowler here, at least.

Wikipedia, for example, states that "This pattern is often incorrectly used outside of remote interfaces." Martin Fowler writes "DTOs are called Data Transfer Objects because their whole purpose is to shift data in expensive remote calls." and further "Not just do you not need them in a local context, they are actually harmful both because a coarse-grained API is more difficult to use and because you have to do all the work moving data from your domain or data source layer into the DTOs." (the article linked above is old, but a good read and contains more interesting remarks about the concept of data transfer objects!) I therefor think that DTOs were not correctly used in Kitodo.Production to begin with, so getting rid of them is a good first step to reduce unnecessary complexity, IMHO.

I also agree with @oliver-stoehr that replacing each DTO with a corresponding new interface should be avoided and just shifts the code overhead to a differently named class. Instead, we should take this opportunity to create a cleaner implementation and take advantage of HibernateSearch's ability to directly manage index and database objects at the same time. Since @matthias-ronge already mentionend that removing the interfaces is an option for a follow-up pull request I can accept introducing them temporarily, even though I am not convinced it is necessary or the best approach. I think it might complicate the implementation of the next pull requests in the HibernateSearch project, but in the end that is up to Matthias.

One little remark concerning these individual pull requests against the hibernate-search branch: for these the reviews should focus on the larger concept of the development and not on code style details like method length, missing java docs etc. These more formal criteria should be checked when the final pull request from the hibernate-search branch back against the master branch is opened.

henning-gerhardt commented 3 months ago

I agree with the @solth opinion regarding to introducing of the new interfaces. I see even no benefits or advantages to introduce them at this point as I did not see any different implementations of this classes in the future as they only used internal / inside the core application itself. If I missed the point than I need at least an explanation why they are really needed.

thomaslow commented 3 months ago

I quickly skipped over the proposed changes. I agree that introducing interface classes that have a 1:1 relationship to their only subclasses is not really necessary. Usually, I like interfaces. In this case, however, the contract between the business logic and the database of Kitodo.Production is already achieved through the ORM model with hibernate beans. In my opinion, there are no additional interface classes necessary.

In the case of DTO classes, I partially agree with @henning-gerhardt. There definitely should be DTO classes. However, the existing DTO classes are not relevant anymore, since they were used between ElasticSearch and the business logic. I believe, this part (choosing the fields that are being indexed and converted to JSON) can now be controlled via Hibernate-Search. Based on the @Field(index=Index.YES/NO, store=Store.YES/NO) annotation, Hibernate-Search will most certainly generate a similar JSON document as before, which provides the same security as before, meaning sensitive information (like passwords) are not leaked to the search engine. Therefore, the existing DTO classes may be removed.

Instead, someday, in a galaxy far far away..., there should be DTO classes between the business logic and the user interface, such that it would not be possible anymore to trigger numerous ORM-based SQL queries from the user interface and potentially build a REST API that can be used to move Kitodo towards the current generation of web technologies. However, this has nothing to do with the migration to Hibernate-Search and should not be part of this pull request.

Erikmitk commented 3 months ago

I don't have anything to add. I agree with the comments made by @henning-gerhardt, @thomaslow and @solth that additional interfaces make no sense in this context. :)

matthias-ronge commented 3 months ago

There are two groups of classes, database objects and index objects. (Here, I have the impression that we think the same thing, they are called DTOs but the name is not the concept behind DTO, so here it is just called that.)

The introduction of the interface is aimed at standardizing the "almost"—but not everywhere exactly—equal classes of objects from the database and objects from the index so that they become interchangeable, and is documentation of what the methods must return. This was also necessary in order to be able to achieve the alignment.

It is a basic tool of refactoring to first encapsulate the section to be newly implemented in an interface, then you can use it based on the interface and reimplement it. This is what happened here.

After the work has been done successfully, there is the option of keeping the interface or removing it. This is undecided here, I can be happy with both.

solth commented 2 months ago

@matthias-ronge I understand what the intention of these new data interfaces is, but they are unecessary in the long run, as discussed above. When the DTO classes are removed, the interfaces are obsolete and should be removed. Since all reviewers seem to agree on this, I would say this is very much decided.

This means they can only be temporary addition. Therefore I think first opening a large pull request introducing a temporary construct and later opening more pull requests that remove this construct again is not very helpful and just multiplies the workload for you to maintain the individual branches and PRs and for the reviewers to perform uncessary reviews.

We did agree on splitting all changes of the hibernate search implementation into multiple pull requests to reduce the size of each individual PR, but the individual pull requests should all just introduce parts that are required for the final implementation and not contain large refactorings of the architecture that are later reverted.

Additionally, I don't think it makes sense to concentrate on fixing all tests for intermediate development states. Builds for pull requests against the hibernate-search branch do not have to pass all checks, in my opinon. Those checks are only necessary to pass once the final pull request against the master is openend.

matthias-ronge commented 2 months ago

In the early days of 3.x development, we placed a lot of emphasis on mapping all functionality using interfaces. When we were removing the legacy UGH library, I was even explicitly asked to represent the entire interface of the library as a graphic, which took me eight hours.

I observe that the strategy for dealing with interfaces is changing. I know that there are reasons for and against them, and it doesn't bother me if we discard them in the end. However, at this point in development, they were necessary because: The beans and the DTO objects were largely congruent, but they differed in some details.

the DTO objects returned strings or integers in some places, or wanted them as setters, where the beans return database objects
the DTO objects returned true or false (present or null/empty) in some places, where the beans contain database objects
the DTO objects returned information that was stored in the database in a different way.

With the introduction of the interfaces, I smoothed out these points and made the objects 100 percent interchangeable, which is necessary for the existing functionality to continue to work. This involved a lot of research to map the functionality properly. Because of the extensive changes, it was only possible for me to manage the brain load in this way; it's not just technical.

I don't think it makes sense to concentrate on fixing all tests for intermediate development states.

I took a pragmatic approach here and analyzed tests that failed. In some cases, these led me to points where the functionality behaved differently than expected or described. So they did what they were supposed to and I was able to correct errors. However, there are also tests that are no longer useful in the change if they tested functionality that was implemented but never used outside of the tests. I deleted these tests. (Less here, but also in the following pull requests.) Thirdly, I temporarily @Ignored a number of tests because they cannot work without an available index because they require index fields that are not currently available. However, these should work again at the end of development.

In principle, I don't think it's wrong if the tests that should work also do pass. If we should deviate from this, I would need to know because I assume that it is a condition for the review. However, I think that this could lead to a pile of unseen errors that then have to be resolved later. I think that this will give us a trustworthy result in the end, which is probably what everyone wants.

matthias-ronge commented 2 months ago

By recent agreement with Release Management, the provision of the code for the entire first part of the development has now been made into one joint pull request #6209. The individual parts pull requests will be closed.