tesshucom / jpsonic

This is a repository for development. See https://github.com/jpsonic/jpsonic
GNU General Public License v3.0
13 stars 13 forks source link

Redesigne scanning process #1925

Closed tesshucom closed 7 months ago

tesshucom commented 1 year ago

Prerequisites: #1922, #1941, #1937, #1955, #1967, #1978, #1984, #2007

Needs to be rewritten because it is obsolete

Overview

A redesign of the scan is done. It will be a large-scale redesign, including related functions, instead of a partial refurbishment like before. Therefore, design changes are divided into several major versions.

Milestones Main theme Scheme Virsion Increment
v111.6.0 Fix scanning workflow, latent bugs.
v112.0.0 Scan status, Scan log viewer, Improve MusicFolder required
v112.1.0 Scan parallelization. Add video parser.
v113.0.0 Podcast improvements maybe needed

Version-No is provisional. Major version update will occur when fields in DB table changes.

The purpose - There is a lot of waste in existing scan implementations. Just a correct rewrite will improve performance. Redesign make it easier to reduce waste, measurements, and add new features. - It will be designed to be very robust to data integrity. We aim for a server suitable for long-term operation that avoids data destruction. - Speedup isn't the primary goal, but turning on parallelism can improve things dramatically. on the other hand, parallelization feature will be provided as an additional option, not as a default. - Jpsonic also considers resource-constrained platforms such as Raspberry Pi, or NAS with embedded Java to be important targets. In these environments, it may be assumed that the scan parallelization option will not be used due to the user's operational policy. Taking full advantage of your machine's resources is sometimes a mistake. - Jpsonic's minimum requirements are the same as Subsonic's or less. (There are differences between 32-bit and 64-bit, so you should roughly double the memory estimate). But, even if you run it with the minimum requirements, it will still perform better than a traditional server. - Many of the bottlenecks are the performance of IO and network equipment. There may be some restrictions on using parallelism effectively. We aim to provide reasonable setting items and statistical information so that users can easily select the best effort according to users environment.

Enumeration of expected work items

With so many fixes and verifications involved in scanning, where to draw the line is a thorny issue. This is a long term effort and may change from time to time.

Fixes coming in v111.6.0 - The scanning process will be fragmented. And we will move to a design that focuses on delta updates. - Individual optimization and measurement will become easier. Although it seems complicated at first glance, the number of SQL issues at the time of new registration will be less than the legacy server. - Increased reusability. Instead of scanning all files, you will be able to scan only specific directories and perform differential updates and statistics updates, in the future. - Essentially this is a requirement. e.g. tag updates etc. should be designed that way. - This is useful when adding completely new functions. Such as uploading single files or directories directly by drag and drop. Achieving cumulative scans is probably a better idea than sticking only to speeding up full scans. - In order to achieve differential update and cancellation, the design and implementation should be almost flawless. That's what it means. [jpsonic 111.6.0](https://github.com/tesshucom/jpsonic/milestone/62) - #2041 - [x] Breaking down existing loops and rearranging them in the proper order - [x] Elimination of "Create or Update" in SQL - [x] Remove unnecessary Lucene index entry registrations - [x] #2067 - [x] #2073 - [x] #2077 - [x] **Sort tags** : Jpsonic has special handling for sort tags. Minor modifications to these. - [x] **Index generation** : Introduces limited display suppression on web pages. - [x] **Fixed how cover art is determined** : Faster way - [x] Check the operation of ID3. ID3 data registration will be improved. - [x] UPnP fix. Because some cache is used. - [x] #2081
Fixes coming in v112.0.0 It would be a good idea to have a small release, separate from v112.1.0, which is likely to be relatively task-intensive. - Fixes in v111.6 will inevitably fix the ability to display scan progress. - Legacy servers are registered and updated in parallel with scanning all files, so the count up of the number of files and the progress have almost the same meaning. At first glance, the design is simple, but the reusability of the logic is very low because it is too tightly coupled, and the process cannot be completed unless all scan processes are executed from beginning to end. - Jpsonic will finish the file scan first and then the differential update will be done. (Or rather, accurate fetching-tag, parallelization, cancellation, partial scan cannot be achieved unless such design is used) So showing progress requires a different design. The progress display specification is a slave spec of the scan architect. - Logging also provides a simple used memory stats. Useful when trying out v112.1.0 features. Or rather, it is often used during development. [jpsonic 112.0.0](https://github.com/tesshucom/jpsonic/milestone/66) - [x] #2119 - [x] #2132 - [x] #2136 - [x] #2142
Fixes coming in 112.1.0 Speed improvements require the environment for verification target. 112.1.0 will support Docker deployment for DS220+. DS220+ is a relatively popular NAS in the Japanese market. The directory structure is partly special, but the productionyml will be provided in v112.1.0 should be able to be diverted to a general Linux environment. - [x] #2176 Below is the scenario for v112.1.0. - For the standard configuration of DS220+, the memory will be 2Gb. 512Mb is expected to run the NAS OS, so the rest is 1.5Gb. Jpsonic is a plan that uses 1Gb of that. (Of course it doesn't consume all 1Gb) - Of the 1Gb a Jpsonic container consumes, 512Mb will be Alpine and 512Mb Java. If we can consume 512Mb with Jpsonic, we can handle about 100,000 songs. As a standard configuration of DS220+, it seems that the specifications are sufficient for general users to use. For larger libraries than this, more memory would be desirable. - The DS220+ has one extra slot for memory, so you can officially add 4Gb. In fact, Intel's CPU recognizes more memory, so many users add third-party memory. (I also added 8Gb memory toDS220+ )
Fixes coming in v112.2.0 In the case of 100,000 in the standard configuration, the time required for scanning is about 5 minutes. (In the case of well-formed tag data and the size of the song data part is 0). If the speed is higher than this, the speed will be unfavorable, and an interim workaround such as turning off some functions will be necessary. A fix will resolve this and make scanning relatively fast even for large libraries. - #1808 Three perspectives are shown in issue extraction. Scan parallelization is pending until all of these are resolved. Parallelism is not a silver bullet in this system. If we ignore the more important issues and forcefully implement it, it will be rather poisonous.
Fixes coming in 112.3.0 It has not been decided where to parallelize, but Video related is relatively effective. Some options will be added. The parallelization done here is assumed to be beneficial during new scans. On the other hand, it is assumed that recursive execution (second and subsequent scans) does not have much benefit. [jpsonic 112.3.0](https://github.com/tesshucom/jpsonic/milestone/65) - [ ] #1445 - [x] #1373
Fixes coming in v113.0.0 Podcasts will undergo a similar design change after scanning has been converged to some extent. Details have not been decided, but there are [many related issues](https://github.com/tesshucom/jpsonic/milestone/63). Web pages are supposed to have a completely different design. [jpsonic 113.0.0](https://github.com/tesshucom/jpsonic/milestone/63)
tesshucom commented 7 months ago

Now, I'm going to add something and close it without rewriting it. There are a wide variety of improvements related to Scan, so there may be slight discrepancies in what is described depending on the time. But, basically the goal is the same. to operate correctly. If we can do that, everything else is easy.


When this issue was first published, the following assumptions were made:

Milestones Main theme
v111.6.0 Fix scanning workflow, latent bugs.
v112.0.0 Scan status, Scan log viewer, Improve MusicFolder
v112.1.0 Scan parallelization. Add video parser
v113.0.0 Podcast improvements

The scan parallelization and video parser work was deferred as a result of more detailed resource monitoring and validation during previous improvement efforts.

These two are not technically difficult. The key is to estimate the appropriate resources and operate appropriately within them. (This is why Media Server Projects, which are community-driven, almost fail: they overplay the easy parts and downplay the important but tedious work. With that approach, you will never reach your goal with a specific theme.)

Instead, v113 has made index generation faster. Additionally, implementations other than Scan began using virtual threads. Podcasts will be improved in v115, where some processing will be parallelized. Parallelization is not a topic separated from functionality. These have been verified based on the premise of DS220+.


As for future plans for Scan, as expected, Podcasts will be improved. Podcast functionality is actually largely dependent on the quality of the Scan. This topic may seem completely unrelated to users, but technically about half of it is an improvement related to Scan.

Regarding how the normal Music Scan will be improved in the future, other Sonic servers and Jpsonic will be able to be improved from completely different perspectives. Compared to Airsonic, Jpsonic has improved ID3 accuracy, more accurate image processing, and faster index generation. Also, the design of the processing flow is more appropriate.

Jpsonic's Scan implementation may require a large amount of code. However, it will be possible to create a detailed design document from there. Subsonic and Airsonic cannot do this (because of poor design)

Specifically, Jpsonic's Scan processing has very clear blocks in the processing process, as you can see from the log. Improvements would be expected to occur in blocks. For example, the file analysis part of the process at the beginning can benefit from parallelization. The latter half of the process is actually a series of processes that cannot be parallelized. It is possible to speed up these processes by improving the SQL or further improving the algorithm. And this is not an extremely difficult topic. We might even be able to estimate the expected improvement for each block and predict how much speed you'll gain overall.

In other words, if improvements have progressed to such an extent that they can be seen, subsequent improvements can be made at any time.


So, in v114, we will move away from the topic of Scan. Scanning is one of the most important functions for the system, but it is not the purpose of the system.

Compared to Airsonic, Jpsonic has improved ID3 accuracy, more accurate image processing, and faster index generation.

At the moment it is already possible to feed these benefits back into the user interface. It would be better to improve UPnP and Web page browsing a little. If that happens, Jpsonic will become a product with a slightly different impression from Subsonic and Airsonic. Works more accurately and is easier to use. It's better to get to that point and improve your Podacst. Of course, if there were multiple me, we would be able to do everything at the same time, and we would be able to improve Scan speed at the same time. Since that is impossible, if we decide on a certain level of priority, we have no choice but to follow the schedule above.


So, this issue is now closed! We're progressing as planned!

If an issue of improvements to Scan is issued again in the future, it will probably be a topic of specific and detailed refactoring of each block of Scan.