pflarue / ardop

ardopcf - A multi-platform implementation of the Amateur Radio Digital Open Protocol (ARDOP)
Other
30 stars 7 forks source link

4PSK amplitude significantly lower than 4FSK #13

Closed cbs228 closed 6 months ago

cbs228 commented 6 months ago

In version c1fe152bfa82e7a032fb126ad691680218dbe8d6, as well as all previous versions, 4PSK waveforms have a noticeably lower amplitude than 4FSK.

In a sample recording of 4PSK.200.100S collected with --writetxwav,

On most radios with linear modulations, this is likely to cause a substantial reduction in transmit power.

Per G8BPQ on the mailing list:

I've also found problems with the normalisation of signal levels in other modes, especially QAM, that is causing significant distortion.

So there are likely normalization issues with the other modes as well.

pflarue commented 6 months ago

A bugfix merged into the develop branch by Pull Request #28 may have helped this problem. It fixed an error in the modulation of all 1-Carrier PSK frame types including 4PSK.200.100S. Assistance in re-evaluating whether that fix as solved this issue would be appreciated.

cbs228 commented 6 months ago

The waveform display certainly looks much better now.

I can confirm that these modes work in a loopback test, but pretty much everything does. Know of any good channel simulators? Looks like FreeDV was tested with PathSim. For actual performance testing, we'd probably want a mode that just sends/expects a fixed test sequence with the FEC protocol and just measures BER. But the original software doesn't have that, so a comparison would be difficult.

I can try this out on-air eventually, but there's been so much lightning at my location. Plus, if I'm talking to a station that doesn't have the fix, I might get misleading results.

Testing is hard.

pflarue commented 6 months ago

I have no personal experience with such tools, but I posted an inquiry to the ardop developers group a while back asking if anyone could comment on the command line implementation of pathsim at https://github.com/bubnikv/pathsim. This is intended to take a wav file as and input, and produce a degraded wav file based on the parameters given. I had no responses to that inquiry, nor have I yet taken the time to explore it further. If this tool works as it claims to, I like that it appears to be cross-platform, and the command line interface makes automation of the process easier than with a GUI tool like the one mentioned in the previous comment.

Perhaps a comparison of the outputs of these two path simulators would provide some mutual validation of their functionality. Unfortunately, I'm focused on other things right now, so I probably won't get around to doing this soon. It would be great if someone else wants to do this or anything else to evaluate the usefulness of this tool.

My thought was that recordings of test frames can now (using the version of ardopcf currently in the develop branch while it is undergoing a bit more testing before a new release is produced) easily be created with the new _SEND command (conveniently used via the new developer mode in the Webgui) and the -T option. These can be degraded to various degrees with something like pathsim, and fed back to ardopcf with the -d option for decoding.

By default, successful decoding of each frame prints frame type, Quality, and a summary of the number of bytes corrected by the Reed Solomon code. This is probably a decent measure of the demodulator and decoder effectiveness, and may also be used to identify problems with encoding. For multi-carrier frame types, the number of bytes corrected for each carrier, which is written to the debug file, may provide additional insight.

I would like to build a large set of such test frames, and perhaps write a python script to automate the process of running them all through ardopcf and parsing the resulting debug log file to give a summary of the results. Such a tool could be used to confirm that future changes to ardopcf haven't degraded receive performance. Comparison of newly created output recordings could be compared to earlier versions to identify any degradation of the transmit capabilities.

Not degrading performance with future changes is very important. As opportunities to improve performance are identified, those can also be more systematically explored once a way to quantitatively evaluate them.

I'm the case of the recent correction of the tx audio frequencies for 1-carrier psk modes, spectrogram plots of the audio produced before and after applying this fix clearly show the intended correction, but I haven't yet tried to quantify the effect this has on performance as described above.

I agree that testing is hard, especially when the goal is to make it quantifiable, repeatable, and have it provide a useful measure of performance over real world HF path conditions.

cbs228 commented 6 months ago

plots of the audio produced before and after applying this fix clearly show the intended correction

I think there's more than enough evidence to close this particular issue for now. Some of these other tasks can be converted into new issue(s) if they aren't already.

anyone could comment on the command line implementation of pathsim

Well, the project appears to have been untouched for the past four years. I have not evaluated its accuracy, but from a pure code-reuse standpoint that's not a great sign.

In my opinion, unit tests would probably be of greater immediate value than a performance test. Unit tests would give us much higher confidence that the various functions do what we think they should do. They will also give ASAN and UBSAN something to chew on. That may be helpful from a stability perspective. Performance tests are nice and all, but I personally find it very frustrating when they point to trouble but don't tell me where.

the new _SEND command

This is a small thing… and you're free to ignore me on this point… but RFC6648 suggests that putting prefixes in front of experimental or non-standard behavior often comes with unintended consequences. If the behavior later becomes a de-facto or actual standard, changing the identifier is breaking. So you might be stuck with the name "forever."

I would like to build a large set of such test frames, and perhaps write a python script to automate the process of running them all through ardopcf

A very worthwhile endeavor. If possible, we should make sure that these tests can run containerized. For test playbacks, it would be best if ardopcf could run without ALSA or sound devices and just does file or stdin/stdout. The simpler the better.

Thanks for fixing this up.

pflarue commented 6 months ago

I thank you for your feedback. I've never claimed to be particularly knowledgeable about anything, including the software development process or this code base whose continued development I've decided to pursue. So, I welcome such input. I feel like I learn a bit more every day. Your reference to RFC6648 raised some interesting points that I had not considered. So, before merging the commits currently in the develop branch into master, I'll rename the _SEND command to something less distinctive.

However, I will take this opportunity to comment on a bit of my current philosophy regarding adding features (like _SEND) to ardopcf that are expected to be useful primarily to myself and anyone else interested in exploring how it works, in contrast to features intended to be useful to those who are just looking for a usable tool. I am quite content for the latter class of people to vastly outnumber the former. Since the former group of people is likely to always be small, I want to encourage anyone with even the slightest interest in learning more. Thus, whenever I need to create some code to help me better understand how this program works (or doesn't work), I've decided to include those diagnostic features in the public code base so that they might also be useful to anyone else with an interest. Keeping such features in private branches that never get pushed to GitHub would make the public code base cleaner and simpler, but might also make it less amenable to exploration. As amateur radio operators, we should always try to encourage exploration and experimentation.

Unfortunately, such diagnostic features introduce a potential problem where they involve the Host interface. I've seen some references to past complaints from writers of programs that interface with ardop via the Host interface, who were unhappy with changes made to that interface that broke their programs. I really don't want to do that, but I also don't want to be prevented from making changes to these diagnostic features which, though they are implemented as a part of the history interface (because it is convenient to do so), are not intended to be used as a means for user facing programs to interface with ardop. Thus, while I will take the advice of RFC6648 not to use command prefixes to mark features not intended for production use, I will continue to indicate in documentation, including comments in source code and online posts, that certain features should be considered experimental, unstable, and undocumented, such that they may be changed or discarded from future releases without advance notice. Anyone choosing to disregard such comments will have no basis to complain if later make breaking changes.

As to the suggestion that unit tests would be more useful than performance tests, I welcome pull requests to implement such tests. However, I don't know of a good way to evaluate the output of the modulation routines other than by testing their output with the demodulation routines and vice versa. I'll also mention that currently there are some very long functions in this code with lots of branches. One of the changes I'm considering making is to break those up into more smaller functions, such that it is easier to follow the logic flow. Some of these smaller functions might be more amenable to unit testing. However, I am concerned about inadvertently breaking something during this reorganization process.

Thus, what you described as performance testing, I'd like to also use to tell me if/when I have broken something. If I can make such tests sufficiently convenient to run, then I'll likely run them more frequently, thereby making it easier to discover what caused any inadvertent change. So, I'll be trying to implement at least a preliminary version of these tests before proceeding to much further with my efforts to clean up this code.

I agree that being able to do file I/O independent of alsa or or any soundcard interface is valuable, at least for testing purposes. The ability to read and write WAV files is my attempt at such. It isn't a perfect solution, but it has already proven to be a useful mechanism for identifying and understanding problems. By using WAV files for this I/O, it allows other existing tools to be used to help evaluate (or modify, in the case of an HF simulator) them. It also allows recordings of over-the-air transmissions and transmissions made by other ardop implementations to be used as additional testing and diagnostic material.

While bit error rate is not currently reported by ardopcf, and may never be reported by other implementations, Reed Solomon error correction statistics provide a reasonably good proxy by effectively reporting byte error rates. The decision to use RS codes appears to be predicated on the assumption that receive errors are likely to occur in bursts rather than as randomly distributed single bit errors. When interpreting these results it is important to also recognize that the error counts should be compared to the frame lengths, rather than the often shorter length of the dats sent via that frame. When less data is available to send than would fill a frame, the padding data used to fill the frame must still be corrected via RS for a every valid frame since the CRC used to confirm correct reception includes the padding data (whose nature is implementation dependent since it is not defined in the Ardop specification).

cbs228 commented 6 months ago

I've decided to include those diagnostic features in the public code base

👍 Yes, please do this. A project that does not publish all of its utilities, test procedures, or build environments is not actually open.

I will continue to indicate in documentation, including comments in source code and online posts, that certain features should be considered experimental

This is a good practice. Other things you can do include:

All of these approaches are valid.

currently there are some very long functions in this code with lots of branches

It is very difficult to add tests to code that was not designed for it. Adding tests "after the fact" always entails a big rewrite.

I am concerned about inadvertently breaking something during this reorganization process.

I don't think I have ever written a defect-free line of code. Only through testing—preferably automated—do I gain some confidence that there are some guardrails on how terrible things can get.

I would approach code that has no tests as "code that is probably broken" and not "code that probably works." With that approach, the notion of a rewrite is a little less scary.

Granted,

To that end, I have opened #30 to propose unit test frameworks. Feel free to review on your own time-table.

receive errors are likely to occur in bursts rather than as randomly distributed single bit errors

This is a very good assumption for RF channels.

Reed Solomon error correction statistics provide a reasonably good proxy

Sounds good to me. It may also be reasonable to look at successful decode rates. Re-transmits and down-shifts really chew up a lot of valuable link time.

pflarue commented 6 months ago

Ardopcf most certainly contains some bugs. I know of some of them, and plan to try to fix them. I further assume that there are many more bugs, or at least opportunities for improvements, that will be discovered over time. However, even with these, ardopcf is currently usable and being used. Thus, I want to prioritize not unintentionally making things worse while trying to improve it.

Thus, see my comments at Issue #30 about implementing some limited validation testing in the short term to try to identify existing bugs or breaking changes until a more rigorous test system can be implemented. In this testing, the focus will be on transmission and decoding of individual frames.

Performance testing in the sense of total throughput rates during an ARQ session, as seems to be referenced by the last comment, is not what I am talking about at this time. While such testing may eventually be useful, it is even more difficult to standardize than testing of individual frames. A more useful second level of testing might be something that would attempt to confirm that the ARQ protocol processing functions react as expected to inputs from the decoders and from the host interface. I can imagine that this might not be too difficult to implement. I believe that the combination of protocol validation and testing of individual frames could provide most of the information needed to identify and begin diagnosing problems.