I realised that the last parameter in the ansatz does not affect the expectation value of $Z$ so it can be set equal to $-\pi$ and then remove the last $R_Z$ from the circuit. With one layer:
$$< Z >=<0|R_X(-\pi/2)R_Z(-\theta_1-\pi)R_X(-\pi/2)R_Z(-\pi-\theta_2) Z R_Z(\pi+\theta_2)R_X(\pi/2)R_Z(\theta_1+\pi)R_X(\pi/2)|0>$$
$$=<0|R_X(-\pi/2)R_Z(-\theta_1-\pi)R_X(-\pi/2)R_Z(\pi)R_X(\pi/2)R_Z(\theta_1+\pi)R_X(\pi/2)|0>$$
since $Z=R_Z(\pi)$. Choosing $\theta_2=-\pi$ the circuit is $R_X(\pi/2)R_Z(\theta_1+\pi)R_X(\pi/2)|0>$.
If we choose $\theta_2=0$ then the circuit is $R_y(\theta_1)|0>$ and $< Z > = <0|R_y(-\theta_1)ZR_y(\theta_1)|0>=\cos^2(\theta_1/2)-\sin^2(\theta_1/2)=\cos(\theta_1)$. This value is valid for every $\theta_2$ so it is clear that this ansatz can learn to fit $sin(x)$.
I have checked that the code works by removing $\theta_2$. I think we should consider a more complicated function, but for now we can try $sin(x)$.
I realised that the last parameter in the ansatz does not affect the expectation value of $Z$ so it can be set equal to $-\pi$ and then remove the last $R_Z$ from the circuit. With one layer: $$< Z >=<0|R_X(-\pi/2)R_Z(-\theta_1-\pi)R_X(-\pi/2)R_Z(-\pi-\theta_2) Z R_Z(\pi+\theta_2)R_X(\pi/2)R_Z(\theta_1+\pi)R_X(\pi/2)|0>$$ $$=<0|R_X(-\pi/2)R_Z(-\theta_1-\pi)R_X(-\pi/2)R_Z(\pi)R_X(\pi/2)R_Z(\theta_1+\pi)R_X(\pi/2)|0>$$ since $Z=R_Z(\pi)$. Choosing $\theta_2=-\pi$ the circuit is $R_X(\pi/2)R_Z(\theta_1+\pi)R_X(\pi/2)|0>$.
If we choose $\theta_2=0$ then the circuit is $R_y(\theta_1)|0>$ and $< Z > = <0|R_y(-\theta_1)ZR_y(\theta_1)|0>=\cos^2(\theta_1/2)-\sin^2(\theta_1/2)=\cos(\theta_1)$. This value is valid for every $\theta_2$ so it is clear that this ansatz can learn to fit $sin(x)$. I have checked that the code works by removing $\theta_2$. I think we should consider a more complicated function, but for now we can try $sin(x)$.